Bird
Raised Fist0
Prompt Engineering / GenAIml~20 mins

Top-p and top-k sampling in Prompt Engineering / GenAI - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Top-p and Top-k Sampling Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
1:30remaining
Understanding Top-k Sampling
In top-k sampling, what does the parameter k control when generating text from a language model?
AThe maximum length of the generated text sequence
BThe number of highest probability tokens considered for sampling at each step
CThe temperature scaling factor applied to logits before sampling
DThe cumulative probability threshold to include tokens for sampling
Attempts:
2 left
💡 Hint
Think about how many tokens the model looks at before picking the next word.
🧠 Conceptual
intermediate
1:30remaining
Understanding Top-p (Nucleus) Sampling
What does the parameter p represent in top-p (nucleus) sampling?
AThe maximum number of tokens generated
BThe temperature value to adjust randomness
CThe fixed number of tokens to consider for sampling
DThe cumulative probability threshold to include tokens for sampling
Attempts:
2 left
💡 Hint
It relates to the total probability mass of tokens considered.
Predict Output
advanced
2:00remaining
Output of Top-k Sampling Code Snippet
What is the output of the following Python code simulating top-k sampling probabilities?
Prompt Engineering / GenAI
import numpy as np
np.random.seed(0)
logits = np.array([0.1, 0.2, 0.3, 0.4, 0.5])
k = 3
# Select top-k logits
indices = np.argsort(logits)[-k:]
topk_probs = np.exp(logits[indices]) / np.sum(np.exp(logits[indices]))
sampled_index = np.random.choice(indices, p=topk_probs)
print(sampled_index)
A3
B0
C4
D2
Attempts:
2 left
💡 Hint
Check which indices are top 3 and how probabilities are computed.
Metrics
advanced
1:30remaining
Effect of Top-p on Diversity Metrics
If you decrease the top-p value from 0.9 to 0.5 during text generation, what is the expected effect on the diversity of generated text?
ADiversity decreases because fewer tokens are considered
BDiversity increases because more tokens are considered
CDiversity remains the same because top-p does not affect token selection
DDiversity fluctuates randomly regardless of top-p
Attempts:
2 left
💡 Hint
Think about how cumulative probability threshold limits token choices.
🔧 Debug
expert
2:30remaining
Identifying Error in Top-k Sampling Implementation
Consider this code snippet for top-k sampling. Which option correctly identifies the error causing incorrect sampling?
Prompt Engineering / GenAI
import numpy as np
logits = np.array([1.0, 2.0, 3.0, 4.0, 5.0])
k = 2
indices = np.argsort(logits)[:k]
topk_logits = logits[indices]
probs = np.exp(topk_logits) / np.sum(np.exp(topk_logits))
sampled_index = np.random.choice(indices, p=probs)
print(sampled_index)
AThe code selects the lowest k logits instead of the highest k logits
BThe softmax calculation is incorrect because it should use log probabilities
CThe sampling should be done over logits, not indices
DThe random seed is missing, causing non-reproducible results
Attempts:
2 left
💡 Hint
Check how argsort is used to select top-k logits.

Practice

(1/5)
1. What does top-k sampling do in text generation?
easy
A. It selects the next word from the top k most likely words.
B. It selects the next word randomly from all possible words.
C. It picks words until their total probability reaches p.
D. It always picks the single most likely next word.

Solution

  1. Step 1: Understand top-k sampling definition

    Top-k sampling limits choices to the top k words with highest probabilities.
  2. Step 2: Compare with other methods

    Random selection from all possible words and picking words until total probability reaches p describe other methods; always picking the single most likely next word is greedy decoding, not sampling.
  3. Final Answer:

    It selects the next word from the top k most likely words. -> Option A
  4. Quick Check:

    Top-k = top k words [OK]
Hint: Top-k means pick from top k words only [OK]
Common Mistakes:
  • Confusing top-k with top-p sampling
  • Thinking top-k picks only one word always
  • Mixing top-k with greedy decoding
2. Which of the following is the correct way to apply top-p sampling in code?
easy
A. Select words until their cumulative probability exceeds p.
B. Select exactly p words with highest probabilities.
C. Select the single word with probability p.
D. Select words randomly ignoring probabilities.

Solution

  1. Step 1: Recall top-p sampling definition

    Top-p sampling chooses the smallest set of words whose total probability is at least p.
  2. Step 2: Evaluate options

    Selecting words until their cumulative probability exceeds p matches this definition. Selecting exactly p words confuses top-p with top-k. Random selection ignoring probabilities and selecting a single word with probability p are incorrect.
  3. Final Answer:

    Select words until their cumulative probability exceeds p. -> Option A
  4. Quick Check:

    Top-p = cumulative probability ≥ p [OK]
Hint: Top-p sums probabilities to reach p [OK]
Common Mistakes:
  • Confusing number of words with cumulative probability
  • Thinking top-p picks fixed number of words
  • Ignoring word probabilities in selection
3. Given these word probabilities sorted descending: {'a': 0.4, 'b': 0.3, 'c': 0.2, 'd': 0.1}, what words are included in top-p sampling with p=0.7?
medium
A. ['a']
B. ['a', 'b', 'c']
C. ['a', 'b']
D. ['a', 'b', 'c', 'd']

Solution

  1. Step 1: Calculate cumulative probabilities

    Sum probabilities in order: 'a' = 0.4, 'a'+'b' = 0.7, 'a'+'b'+'c' = 0.9.
  2. Step 2: Select smallest set ≥ p=0.7

    The smallest set with sum ≥ 0.7 is ['a', 'b'].
  3. Final Answer:

    ['a', 'b'] -> Option C
  4. Quick Check:

    Cumulative sum ≥ 0.7 includes 'a' and 'b' [OK]
Hint: Sum probabilities until ≥ p [OK]
Common Mistakes:
  • Including too many words beyond p
  • Stopping before reaching p
  • Confusing top-p with top-k count
4. You wrote code for top-k sampling but it always picks only one word. What is the likely bug?
medium
A. You summed probabilities instead of sorting words.
B. You set k=1 instead of a larger number.
C. You used top-p sampling code instead of top-k.
D. You forgot to normalize probabilities.

Solution

  1. Step 1: Understand top-k parameter effect

    Setting k=1 means only the single most likely word is chosen.
  2. Step 2: Check other options

    Summing probabilities or mixing methods won't cause always one word; normalization affects probabilities but not count.
  3. Final Answer:

    You set k=1 instead of a larger number. -> Option B
  4. Quick Check:

    k=1 picks only one word [OK]
Hint: Check if k=1 limits output to one word [OK]
Common Mistakes:
  • Confusing top-k and top-p parameters
  • Ignoring parameter values in code
  • Assuming normalization fixes count
5. You want to generate text that balances creativity and coherence. Which approach is best?
hard
A. Use random sampling ignoring probabilities.
B. Use greedy decoding to always pick the most likely word.
C. Use top-k sampling with k=1 only.
D. Use top-k sampling with a moderate k and top-p sampling with p around 0.9 together.

Solution

  1. Step 1: Understand creativity vs coherence tradeoff

    Greedy decoding is too rigid; random sampling is too chaotic; top-k with k=1 is greedy.
  2. Step 2: Combine top-k and top-p for balance

    Using moderate k and p near 0.9 limits choices to plausible words but allows variety, improving naturalness.
  3. Final Answer:

    Use top-k sampling with a moderate k and top-p sampling with p around 0.9 together. -> Option D
  4. Quick Check:

    Combining top-k and top-p balances randomness and coherence [OK]
Hint: Combine top-k and top-p for best text quality [OK]
Common Mistakes:
  • Choosing greedy decoding for creativity
  • Ignoring probability thresholds
  • Using too small k or p values