Bird
Raised Fist0
Prompt Engineering / GenAIml~5 mins

Top-p and top-k sampling in Prompt Engineering / GenAI - Cheat Sheet & Quick Revision

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is top-k sampling in language generation?
Top-k sampling picks the next word from the top k most likely words predicted by the model. It limits choices to a fixed number, making output more focused but still random.
Click to reveal answer
beginner
Explain top-p (nucleus) sampling in simple terms.
Top-p sampling chooses the smallest set of words whose combined probability is at least p (like 0.9). It adapts the number of choices based on confidence, allowing more variety when uncertain.
Click to reveal answer
intermediate
How does top-k sampling differ from top-p sampling?
Top-k always picks from a fixed number of words (k), while top-p picks from a variable number of words that together cover a probability threshold (p). Top-p adapts to the model's confidence.
Click to reveal answer
beginner
Why do we use sampling methods like top-k or top-p instead of always picking the most likely word?
Always picking the most likely word can make text boring and repetitive. Sampling adds randomness to create more natural, diverse, and interesting outputs.
Click to reveal answer
intermediate
What happens if you set top-p to 1.0 in top-p sampling?
Setting top-p to 1.0 means including all possible words (100% probability), so it becomes like sampling from the entire vocabulary, which can produce very diverse but less coherent text.
Click to reveal answer
In top-k sampling, what does 'k' represent?
AThe fixed number of top words to sample from
BThe probability threshold for cumulative words
CThe total vocabulary size
DThe temperature of the model
What does top-p sampling use to decide which words to sample from?
AA fixed number of words
BThe word frequency in training data
CThe word length
DA cumulative probability threshold
Which sampling method adapts the number of candidate words based on model confidence?
ATop-p sampling
BRandom sampling
CTop-k sampling
DGreedy decoding
Why might always picking the most likely word be a bad idea for text generation?
AIt makes text too random
BIt reduces vocabulary size
CIt causes repetitive and boring text
DIt increases computation time
If top-p is set very low (e.g., 0.1), what is likely to happen?
AMore words are considered for sampling
BOnly very few high-probability words are sampled
CSampling becomes completely random
DThe model ignores probabilities
Describe how top-k and top-p sampling work and how they differ.
Think about fixed number vs probability threshold.
You got /4 concepts.
    Explain why sampling methods like top-k and top-p are important in AI text generation.
    Consider what happens if you always pick the top word.
    You got /4 concepts.

      Practice

      (1/5)
      1. What does top-k sampling do in text generation?
      easy
      A. It selects the next word from the top k most likely words.
      B. It selects the next word randomly from all possible words.
      C. It picks words until their total probability reaches p.
      D. It always picks the single most likely next word.

      Solution

      1. Step 1: Understand top-k sampling definition

        Top-k sampling limits choices to the top k words with highest probabilities.
      2. Step 2: Compare with other methods

        Random selection from all possible words and picking words until total probability reaches p describe other methods; always picking the single most likely next word is greedy decoding, not sampling.
      3. Final Answer:

        It selects the next word from the top k most likely words. -> Option A
      4. Quick Check:

        Top-k = top k words [OK]
      Hint: Top-k means pick from top k words only [OK]
      Common Mistakes:
      • Confusing top-k with top-p sampling
      • Thinking top-k picks only one word always
      • Mixing top-k with greedy decoding
      2. Which of the following is the correct way to apply top-p sampling in code?
      easy
      A. Select words until their cumulative probability exceeds p.
      B. Select exactly p words with highest probabilities.
      C. Select the single word with probability p.
      D. Select words randomly ignoring probabilities.

      Solution

      1. Step 1: Recall top-p sampling definition

        Top-p sampling chooses the smallest set of words whose total probability is at least p.
      2. Step 2: Evaluate options

        Selecting words until their cumulative probability exceeds p matches this definition. Selecting exactly p words confuses top-p with top-k. Random selection ignoring probabilities and selecting a single word with probability p are incorrect.
      3. Final Answer:

        Select words until their cumulative probability exceeds p. -> Option A
      4. Quick Check:

        Top-p = cumulative probability ≥ p [OK]
      Hint: Top-p sums probabilities to reach p [OK]
      Common Mistakes:
      • Confusing number of words with cumulative probability
      • Thinking top-p picks fixed number of words
      • Ignoring word probabilities in selection
      3. Given these word probabilities sorted descending: {'a': 0.4, 'b': 0.3, 'c': 0.2, 'd': 0.1}, what words are included in top-p sampling with p=0.7?
      medium
      A. ['a']
      B. ['a', 'b', 'c']
      C. ['a', 'b']
      D. ['a', 'b', 'c', 'd']

      Solution

      1. Step 1: Calculate cumulative probabilities

        Sum probabilities in order: 'a' = 0.4, 'a'+'b' = 0.7, 'a'+'b'+'c' = 0.9.
      2. Step 2: Select smallest set ≥ p=0.7

        The smallest set with sum ≥ 0.7 is ['a', 'b'].
      3. Final Answer:

        ['a', 'b'] -> Option C
      4. Quick Check:

        Cumulative sum ≥ 0.7 includes 'a' and 'b' [OK]
      Hint: Sum probabilities until ≥ p [OK]
      Common Mistakes:
      • Including too many words beyond p
      • Stopping before reaching p
      • Confusing top-p with top-k count
      4. You wrote code for top-k sampling but it always picks only one word. What is the likely bug?
      medium
      A. You summed probabilities instead of sorting words.
      B. You set k=1 instead of a larger number.
      C. You used top-p sampling code instead of top-k.
      D. You forgot to normalize probabilities.

      Solution

      1. Step 1: Understand top-k parameter effect

        Setting k=1 means only the single most likely word is chosen.
      2. Step 2: Check other options

        Summing probabilities or mixing methods won't cause always one word; normalization affects probabilities but not count.
      3. Final Answer:

        You set k=1 instead of a larger number. -> Option B
      4. Quick Check:

        k=1 picks only one word [OK]
      Hint: Check if k=1 limits output to one word [OK]
      Common Mistakes:
      • Confusing top-k and top-p parameters
      • Ignoring parameter values in code
      • Assuming normalization fixes count
      5. You want to generate text that balances creativity and coherence. Which approach is best?
      hard
      A. Use random sampling ignoring probabilities.
      B. Use greedy decoding to always pick the most likely word.
      C. Use top-k sampling with k=1 only.
      D. Use top-k sampling with a moderate k and top-p sampling with p around 0.9 together.

      Solution

      1. Step 1: Understand creativity vs coherence tradeoff

        Greedy decoding is too rigid; random sampling is too chaotic; top-k with k=1 is greedy.
      2. Step 2: Combine top-k and top-p for balance

        Using moderate k and p near 0.9 limits choices to plausible words but allows variety, improving naturalness.
      3. Final Answer:

        Use top-k sampling with a moderate k and top-p sampling with p around 0.9 together. -> Option D
      4. Quick Check:

        Combining top-k and top-p balances randomness and coherence [OK]
      Hint: Combine top-k and top-p for best text quality [OK]
      Common Mistakes:
      • Choosing greedy decoding for creativity
      • Ignoring probability thresholds
      • Using too small k or p values