I know how Beam Search work, and i know that at each step of decoder, we keep k top result and continue decode with them. The thing i want to ask is beam search is applied to the test time only or in both test and train???????? Read more from reddit.com…

thumbnail courtesy of reddit.com