Jump into concepts and practice - no test required
or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is top-k sampling in language generation?
Top-k sampling picks the next word from the top k most likely words predicted by the model. It limits choices to a fixed number, making output more focused but still random.
Click to reveal answer
beginner
Explain top-p (nucleus) sampling in simple terms.
Top-p sampling chooses the smallest set of words whose combined probability is at least p (like 0.9). It adapts the number of choices based on confidence, allowing more variety when uncertain.
Click to reveal answer
intermediate
How does top-k sampling differ from top-p sampling?
Top-k always picks from a fixed number of words (k), while top-p picks from a variable number of words that together cover a probability threshold (p). Top-p adapts to the model's confidence.
Click to reveal answer
beginner
Why do we use sampling methods like top-k or top-p instead of always picking the most likely word?
Always picking the most likely word can make text boring and repetitive. Sampling adds randomness to create more natural, diverse, and interesting outputs.
Click to reveal answer
intermediate
What happens if you set top-p to 1.0 in top-p sampling?
Setting top-p to 1.0 means including all possible words (100% probability), so it becomes like sampling from the entire vocabulary, which can produce very diverse but less coherent text.
Click to reveal answer
In top-k sampling, what does 'k' represent?
AThe fixed number of top words to sample from
BThe probability threshold for cumulative words
CThe total vocabulary size
DThe temperature of the model
✗ Incorrect
In top-k sampling, 'k' is the fixed number of most likely words considered for sampling.
What does top-p sampling use to decide which words to sample from?
AA fixed number of words
BThe word frequency in training data
CThe word length
DA cumulative probability threshold
✗ Incorrect
Top-p sampling selects words whose total probability adds up to the threshold p.
Which sampling method adapts the number of candidate words based on model confidence?
ATop-p sampling
BRandom sampling
CTop-k sampling
DGreedy decoding
✗ Incorrect
Top-p sampling adapts the candidate set size based on cumulative probability, reflecting model confidence.
Why might always picking the most likely word be a bad idea for text generation?
AIt makes text too random
BIt reduces vocabulary size
CIt causes repetitive and boring text
DIt increases computation time
✗ Incorrect
Always picking the most likely word leads to repetitive and less creative text.
If top-p is set very low (e.g., 0.1), what is likely to happen?
AMore words are considered for sampling
BOnly very few high-probability words are sampled
CSampling becomes completely random
DThe model ignores probabilities
✗ Incorrect
A low top-p threshold means only the highest probability words are included, limiting diversity.
Describe how top-k and top-p sampling work and how they differ.
Think about fixed number vs probability threshold.
You got /4 concepts.
Explain why sampling methods like top-k and top-p are important in AI text generation.
Consider what happens if you always pick the top word.
You got /4 concepts.
Practice
(1/5)
1. What does top-k sampling do in text generation?
easy
A. It selects the next word from the top k most likely words.
B. It selects the next word randomly from all possible words.
C. It picks words until their total probability reaches p.
D. It always picks the single most likely next word.
Solution
Step 1: Understand top-k sampling definition
Top-k sampling limits choices to the top k words with highest probabilities.
Step 2: Compare with other methods
Random selection from all possible words and picking words until total probability reaches p describe other methods; always picking the single most likely next word is greedy decoding, not sampling.
Final Answer:
It selects the next word from the top k most likely words. -> Option A
Quick Check:
Top-k = top k words [OK]
Hint: Top-k means pick from top k words only [OK]
Common Mistakes:
Confusing top-k with top-p sampling
Thinking top-k picks only one word always
Mixing top-k with greedy decoding
2. Which of the following is the correct way to apply top-p sampling in code?
easy
A. Select words until their cumulative probability exceeds p.
B. Select exactly p words with highest probabilities.
C. Select the single word with probability p.
D. Select words randomly ignoring probabilities.
Solution
Step 1: Recall top-p sampling definition
Top-p sampling chooses the smallest set of words whose total probability is at least p.
Step 2: Evaluate options
Selecting words until their cumulative probability exceeds p matches this definition. Selecting exactly p words confuses top-p with top-k. Random selection ignoring probabilities and selecting a single word with probability p are incorrect.
Final Answer:
Select words until their cumulative probability exceeds p. -> Option A
Quick Check:
Top-p = cumulative probability ≥ p [OK]
Hint: Top-p sums probabilities to reach p [OK]
Common Mistakes:
Confusing number of words with cumulative probability
Thinking top-p picks fixed number of words
Ignoring word probabilities in selection
3. Given these word probabilities sorted descending: {'a': 0.4, 'b': 0.3, 'c': 0.2, 'd': 0.1}, what words are included in top-p sampling with p=0.7?
medium
A. ['a']
B. ['a', 'b', 'c']
C. ['a', 'b']
D. ['a', 'b', 'c', 'd']
Solution
Step 1: Calculate cumulative probabilities
Sum probabilities in order: 'a' = 0.4, 'a'+'b' = 0.7, 'a'+'b'+'c' = 0.9.
Step 2: Select smallest set ≥ p=0.7
The smallest set with sum ≥ 0.7 is ['a', 'b'].
Final Answer:
['a', 'b'] -> Option C
Quick Check:
Cumulative sum ≥ 0.7 includes 'a' and 'b' [OK]
Hint: Sum probabilities until ≥ p [OK]
Common Mistakes:
Including too many words beyond p
Stopping before reaching p
Confusing top-p with top-k count
4. You wrote code for top-k sampling but it always picks only one word. What is the likely bug?
medium
A. You summed probabilities instead of sorting words.
B. You set k=1 instead of a larger number.
C. You used top-p sampling code instead of top-k.
D. You forgot to normalize probabilities.
Solution
Step 1: Understand top-k parameter effect
Setting k=1 means only the single most likely word is chosen.
Step 2: Check other options
Summing probabilities or mixing methods won't cause always one word; normalization affects probabilities but not count.
Final Answer:
You set k=1 instead of a larger number. -> Option B
Quick Check:
k=1 picks only one word [OK]
Hint: Check if k=1 limits output to one word [OK]
Common Mistakes:
Confusing top-k and top-p parameters
Ignoring parameter values in code
Assuming normalization fixes count
5. You want to generate text that balances creativity and coherence. Which approach is best?
hard
A. Use random sampling ignoring probabilities.
B. Use greedy decoding to always pick the most likely word.
C. Use top-k sampling with k=1 only.
D. Use top-k sampling with a moderate k and top-p sampling with p around 0.9 together.
Solution
Step 1: Understand creativity vs coherence tradeoff
Greedy decoding is too rigid; random sampling is too chaotic; top-k with k=1 is greedy.
Step 2: Combine top-k and top-p for balance
Using moderate k and p near 0.9 limits choices to plausible words but allows variety, improving naturalness.
Final Answer:
Use top-k sampling with a moderate k and top-p sampling with p around 0.9 together. -> Option D
Quick Check:
Combining top-k and top-p balances randomness and coherence [OK]
Hint: Combine top-k and top-p for best text quality [OK]