Jump into concepts and practice - no test required
or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is beam search decoding in NLP?
Beam search decoding is a method to find the most likely sequence of words by exploring multiple options at each step, keeping only the best few sequences (called beams) instead of just one.
Click to reveal answer
beginner
Why is beam search better than greedy search?
Beam search keeps multiple candidate sequences at each step, so it can avoid early mistakes that greedy search makes by choosing only the single best option at each step.
Click to reveal answer
intermediate
What does the beam width control in beam search decoding?
Beam width controls how many candidate sequences are kept at each step. A larger beam width means more sequences are considered, which can improve results but takes more time.
Click to reveal answer
intermediate
How does beam search handle sequence probabilities during decoding?
Beam search multiplies or adds the log probabilities of words in a sequence to score each candidate. It keeps the top sequences with the highest total scores at each step.
Click to reveal answer
intermediate
What is a common drawback of beam search decoding?
Beam search can still miss the best sequence if the beam width is too small, and it can be slower than greedy search because it keeps multiple candidates.
Click to reveal answer
What does beam search keep track of during decoding?
AMultiple best candidate sequences
BOnly the single best sequence
CRandom sequences
DAll possible sequences
✗ Incorrect
Beam search keeps multiple best candidate sequences (beams) at each step to explore more options.
What happens if you increase the beam width in beam search?
AThe model trains faster
BFewer sequences are considered, speeding up decoding
CMore sequences are considered, improving accuracy but increasing computation
DThe output becomes random
✗ Incorrect
Increasing beam width means more candidate sequences are kept, which can improve results but requires more computation.
Which of these is a key difference between greedy search and beam search?
ABeam search ignores probabilities
BGreedy search keeps one best sequence; beam search keeps multiple
CGreedy search is slower than beam search
DBeam search only works for images
✗ Incorrect
Greedy search keeps only the single best sequence at each step, while beam search keeps multiple candidates.
How does beam search score candidate sequences?
ABy summing or multiplying word probabilities
BBy counting word length
CBy random selection
DBy alphabetical order
✗ Incorrect
Beam search scores sequences by combining the probabilities of each word, often using log probabilities for numerical stability.
What is a potential downside of using a very small beam width?
AIt will always find the best sequence
BIt will generate random outputs
CIt will be too slow
DIt may miss better sequences and behave like greedy search
✗ Incorrect
A small beam width limits the search and can cause the method to miss better sequences, similar to greedy search.
Explain how beam search decoding works and why it is used instead of greedy search.
Think about how exploring more options helps find better sentences.
You got /4 concepts.
Describe the trade-offs involved in choosing the beam width for beam search decoding.
Consider speed versus quality of results.
You got /4 concepts.
Practice
(1/5)
1. What is the main purpose of beam search decoding in natural language processing?
easy
A. To keep track of multiple best candidate sequences during prediction
B. To randomly select words for output generation
C. To generate only one possible output sequence
D. To speed up training by skipping steps
Solution
Step 1: Understand beam search goal
Beam search keeps multiple candidate sequences to explore more options than greedy search.
Step 2: Compare options
Only To keep track of multiple best candidate sequences during prediction describes keeping multiple best guesses; others describe random choice, single output, or unrelated speed-up.
Final Answer:
To keep track of multiple best candidate sequences during prediction -> Option A
Quick Check:
Beam search = multiple best sequences [OK]
Hint: Beam search tracks several top guesses, not just one [OK]
Common Mistakes:
Confusing beam search with random sampling
Thinking beam search outputs only one sequence
Assuming beam search speeds up training
2. Which of the following is the correct way to describe the beam width in beam search decoding?
easy
A. The size of the vocabulary used for prediction
B. The number of candidate sequences kept at each decoding step
C. The length of the output sequence generated
D. The number of layers in the neural network
Solution
Step 1: Define beam width
Beam width is how many top sequences the algorithm keeps at each step to explore.
Step 2: Eliminate incorrect options
Output length, vocabulary size, and network layers are unrelated to beam width.
Final Answer:
The number of candidate sequences kept at each decoding step -> Option B
Quick Check:
Beam width = candidate count per step [OK]
Hint: Beam width = how many sequences you keep each step [OK]
Common Mistakes:
Mixing beam width with output length
Confusing beam width with vocabulary size
Thinking beam width relates to model architecture
3. Consider a beam search with beam width 2 decoding a sequence. At step 1, the top 2 tokens have scores [0.6, 0.4]. At step 2, each token expands to two tokens with scores: from first token [0.5, 0.3], from second token [0.7, 0.2]. Which two sequences will beam search keep after step 2?
medium
A. [First token + second expansion (0.6*0.3), Second token + second expansion (0.4*0.2)]
B. [First token + first expansion (0.6*0.5), First token + second expansion (0.6*0.3)]
C. [Second token + first expansion (0.4*0.7), Second token + second expansion (0.4*0.2)]
D. [First token + first expansion (0.6*0.5), Second token + first expansion (0.4*0.7)]
Top two scores are 0.3 and 0.28, corresponding to first token + first expansion and second token + first expansion.
Final Answer:
[First token + first expansion (0.6*0.5), Second token + first expansion (0.4*0.7)] -> Option D
Quick Check:
Top scores = 0.3 and 0.28 [OK]
Hint: Multiply scores, pick top beam width sequences [OK]
Common Mistakes:
Choosing expansions only from one token
Not multiplying scores correctly
Picking lower scoring sequences
4. You implemented beam search decoding but notice it always returns the same output sequence regardless of input. What is the most likely bug?
medium
A. The vocabulary size is too large
B. The model is not trained
C. Beam width is set to 1, making it greedy search
D. The beam search is not normalizing scores
Solution
Step 1: Analyze symptom of identical outputs
Always same output suggests no exploration of multiple sequences.
Step 2: Identify beam width effect
If beam width = 1, beam search reduces to greedy search, always picking highest scoring token only.
Final Answer:
Beam width is set to 1, making it greedy search -> Option C
Quick Check:
Beam width 1 = greedy search [OK]
Hint: Check beam width; 1 means no beam search [OK]
Common Mistakes:
Blaming vocabulary size for output sameness
Ignoring beam width setting
Assuming model training causes identical outputs
5. In a machine translation task, you want to balance output quality and decoding speed. You have a beam search decoder with beam width 5. What happens if you increase the beam width to 20?
hard
A. Output quality may improve but decoding will be slower
B. Output quality will decrease and decoding will be faster
C. Output quality and decoding speed remain the same
D. Decoding speed improves but output quality is unpredictable
Solution
Step 1: Understand beam width effect on quality
Larger beam width explores more sequences, often improving output quality.
Step 2: Understand beam width effect on speed
More sequences to track means more computation, slowing decoding speed.
Final Answer:
Output quality may improve but decoding will be slower -> Option A