Bird
Raised Fist0
NLPml~20 mins

Padding and sequence length in NLP - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Padding Pro
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
1:00remaining
Why do we pad sequences in NLP models?

In natural language processing, why is padding used when preparing sequences for models?

ATo make all sequences the same length so they can be processed in batches.
BTo convert text into numerical values.
CTo increase the vocabulary size of the model.
DTo remove stop words from the sequences.
Attempts:
2 left
💡 Hint

Think about how models handle input data in batches.

Predict Output
intermediate
1:30remaining
Output length after padding sequences

Given the following code that pads sequences to a max length of 5, what is the output?

NLP
from tensorflow.keras.preprocessing.sequence import pad_sequences
sequences = [[1, 2, 3], [4, 5], [6]]
padded = pad_sequences(sequences, maxlen=5, padding='post')
print(padded.tolist())
A[[1, 2, 3], [4, 5], [6]]
B[[0, 0, 1, 2, 3], [0, 0, 0, 4, 5], [0, 0, 0, 0, 6]]
C[[1, 2, 3, 0, 0], [4, 5, 0, 0, 0], [6, 0, 0, 0, 0]]
D[[1, 2, 3, 0], [4, 5, 0], [6, 0, 0]]
Attempts:
2 left
💡 Hint

Check the padding='post' argument and maxlen=5.

Model Choice
advanced
2:00remaining
Choosing sequence length for RNN training

You have text sequences of varying lengths from 10 to 100 tokens. You want to train an RNN model efficiently. Which sequence length choice is best?

APad all sequences to length 100 to keep full information.
BPad sequences to the median length around 50 and truncate longer ones.
CPad all sequences to length 10 to speed up training.
DDo not pad sequences and train with variable lengths.
Attempts:
2 left
💡 Hint

Consider balancing information retention and training speed.

Metrics
advanced
1:30remaining
Effect of padding on model accuracy

When training a text classification model, how can excessive padding affect accuracy?

APadding causes the model to overfit the training data.
BPadding has no effect on accuracy since it is ignored by the model.
CMore padding always improves accuracy by standardizing input size.
DExcessive padding can introduce noise and reduce accuracy.
Attempts:
2 left
💡 Hint

Think about how meaningless zeros might affect learning.

🔧 Debug
expert
2:00remaining
Why does this padding code truncate unexpectedly?

Consider this code snippet:

from tensorflow.keras.preprocessing.sequence import pad_sequences
sequences = [[1, 2], [3, 4, 5, 6]]
padded = pad_sequences(sequences, maxlen=3, padding='post')
print(padded)

Why does this code truncate unexpectedly?

ABecause maxlen is smaller than the length of one sequence, causing truncation without specifying truncating='post'.
BBecause pad_sequences requires all sequences to be the same length before padding.
CBecause sequences contain integers instead of strings.
DBecause padding='post' is invalid and should be 'pre'.
Attempts:
2 left
💡 Hint

Check the maxlen parameter and sequence lengths.

Practice

(1/5)
1. What is the main purpose of padding in text sequences for machine learning models?
easy
A. To convert text into numbers without changing length
B. To make all sequences the same length by adding extra values
C. To randomly shuffle the words in sequences
D. To remove important words from sequences

Solution

  1. Step 1: Understand padding concept

    Padding adds extra values (usually zeros) to sequences to make them all the same length.
  2. Step 2: Recognize why padding is used

    This uniform length helps models process batches of data efficiently without errors.
  3. Final Answer:

    To make all sequences the same length by adding extra values -> Option B
  4. Quick Check:

    Padding = same length sequences [OK]
Hint: Padding adds extra tokens to equalize sequence lengths [OK]
Common Mistakes:
  • Thinking padding removes words
  • Confusing padding with shuffling
  • Believing padding changes text meaning
2. Which of the following is the correct way to pad sequences using Python's Keras library?
easy
A. pad_sequences(sequences, maxlen=10, shuffle=True)
B. pad_sequences(sequences, maxlen=10, reverse=True)
C. pad_sequences(sequences, maxlen=10, padding='post')
D. pad_sequences(sequences, maxlen=10, drop=True)

Solution

  1. Step 1: Identify correct padding function parameters

    Keras's pad_sequences uses 'padding' to specify where to add zeros, e.g., 'post' means after the sequence.
  2. Step 2: Check options for valid parameters

    Only pad_sequences(sequences, maxlen=10, padding='post') uses a valid parameter 'padding' with a correct value 'post'. Others use invalid parameters like shuffle, reverse, drop.
  3. Final Answer:

    pad_sequences(sequences, maxlen=10, padding='post') -> Option C
  4. Quick Check:

    Correct padding param = pad_sequences(sequences, maxlen=10, padding='post') [OK]
Hint: Use 'padding' param in pad_sequences, not shuffle or drop [OK]
Common Mistakes:
  • Using non-existent parameters like shuffle or drop
  • Confusing padding location with sequence order
  • Forgetting to set maxlen for fixed length
3. Given the code below, what will be the output shape of padded_sequences?
from tensorflow.keras.preprocessing.sequence import pad_sequences
sequences = [[1, 2, 3], [4, 5], [6]]
padded_sequences = pad_sequences(sequences, maxlen=4, padding='pre')
medium
A. (3, 4)
B. (4, 3)
C. (3, 3)
D. (4, 4)

Solution

  1. Step 1: Count number of sequences

    There are 3 sequences: [1,2,3], [4,5], and [6].
  2. Step 2: Understand padding effect on length

    maxlen=4 means each sequence is padded or truncated to length 4. So output shape is (3 sequences, 4 length each).
  3. Final Answer:

    (3, 4) -> Option A
  4. Quick Check:

    Number sequences = 3, length = 4 [OK]
Hint: Output shape = (number sequences, maxlen) [OK]
Common Mistakes:
  • Confusing maxlen with number of sequences
  • Mixing up padding='pre' with output shape
  • Assuming shape changes with padding side
4. You wrote this code but get an error: TypeError: pad_sequences() got an unexpected keyword argument 'pad'. What is the likely mistake?
padded = pad_sequences(sequences, maxlen=5, pad='post')
medium
A. The parameter name should be 'padding', not 'pad'
B. maxlen must be smaller than sequence length
C. Sequences must be numpy arrays, not lists
D. pad_sequences requires a 'value' parameter

Solution

  1. Step 1: Identify error cause from message

    The error says 'unexpected keyword argument pad', meaning 'pad' is not a valid parameter.
  2. Step 2: Recall correct parameter name

    The correct parameter to specify padding side is 'padding', not 'pad'.
  3. Final Answer:

    The parameter name should be 'padding', not 'pad' -> Option A
  4. Quick Check:

    Correct param = 'padding' [OK]
Hint: Use 'padding' param, not 'pad' [OK]
Common Mistakes:
  • Using 'pad' instead of 'padding'
  • Assuming maxlen must be smaller than sequences
  • Thinking sequences must be numpy arrays
5. You have text sequences of varying lengths. You want to pad them to length 10 but keep the last 10 words only if longer. Which code correctly achieves this using Keras?
hard
A. pad_sequences(sequences, maxlen=10, padding='post', truncating='pre')
B. pad_sequences(sequences, maxlen=10, padding='post', truncating='post')
C. pad_sequences(sequences, maxlen=10, padding='pre', truncating='post')
D. pad_sequences(sequences, maxlen=10, padding='pre', truncating='pre')

Solution

  1. Step 1: Understand padding and truncating sides

    Padding='pre' adds zeros at the start; truncating='pre' removes words from the start, keeping last words.
  2. Step 2: Match requirement to keep last 10 words

    To keep last 10 words, truncate from the start ('pre') and pad at the start ('pre').
  3. Final Answer:

    pad_sequences(sequences, maxlen=10, padding='pre', truncating='pre') -> Option D
  4. Quick Check:

    Keep last words = truncating='pre' [OK]
Hint: Use truncating='pre' to keep last words, padding='pre' to pad start [OK]
Common Mistakes:
  • Using padding='post' which pads end instead of start
  • Using truncating='post' which removes last words
  • Confusing padding and truncating parameters