Beam search is often used in sequence generation tasks like translation or text generation. What does beam search primarily help with?
Think about how beam search keeps track of multiple candidates but limits the number to avoid huge computation.
Beam search keeps a fixed number (beam width) of the best partial sequences at each step, balancing exploration and efficiency. It does not explore all sequences exhaustively nor randomly.
Given the following partial sequences and their log probabilities, what are the top 2 sequences after expanding one step?
partial_sequences = [("I am", -1.0), ("You are", -1.2)]
candidates = {
"I am": [("happy", -0.5), ("sad", -1.5)],
"You are": [("kind", -0.3), ("mean", -2.0)]
}
beam_width = 2
Calculate the new sequences with summed log probabilities and pick top 2.
Add the log probabilities of partial sequences and their expansions, then pick the top 2 with highest (least negative) sums.
Calculate sums: 'I am happy' = -1.0 + -0.5 = -1.5, 'I am sad' = -1.0 + -1.5 = -2.5, 'You are kind' = -1.2 + -0.3 = -1.5, 'You are mean' = -1.2 + -2.0 = -3.2. Top 2 are 'I am happy' and 'You are kind' both at -1.5.
You have a neural machine translation model. You want to balance translation quality and decoding speed. Which beam width is most suitable?
Think about typical beam widths used in practice for good quality without too much slowdown.
Beam width 1 is greedy and fast but less accurate. Very large beam widths slow decoding drastically. Beam width around 5 is commonly used to balance quality and speed. Beam width 0 is invalid.
In an experiment, increasing beam width from 1 to 10 affects BLEU score and decoding time. Which statement is true?
Consider how beam search explores more sequences with larger beam widths and the tradeoff involved.
Increasing beam width explores more candidates, often improving BLEU score initially but with diminishing returns or slight degradation due to search errors. Decoding time grows roughly linearly with beam width.
A sequence generation model using beam search often outputs repetitive phrases like 'the the the'. What is the most likely cause?
Think about how beam search picks sequences with highest probabilities and how model biases affect output.
Beam search selects sequences with highest probabilities, so if the model tends to assign high probability to repeating tokens, beam search will amplify this repetition. This is a known issue requiring techniques like repetition penalty or coverage penalty.