0
0
NLPml~20 mins

Padding and sequence length in NLP - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Padding Pro
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
1:00remaining
Why do we pad sequences in NLP models?

In natural language processing, why is padding used when preparing sequences for models?

ATo make all sequences the same length so they can be processed in batches.
BTo convert text into numerical values.
CTo increase the vocabulary size of the model.
DTo remove stop words from the sequences.
Attempts:
2 left
💡 Hint

Think about how models handle input data in batches.

Predict Output
intermediate
1:30remaining
Output length after padding sequences

Given the following code that pads sequences to a max length of 5, what is the output?

NLP
from tensorflow.keras.preprocessing.sequence import pad_sequences
sequences = [[1, 2, 3], [4, 5], [6]]
padded = pad_sequences(sequences, maxlen=5, padding='post')
print(padded.tolist())
A[[1, 2, 3], [4, 5], [6]]
B[[0, 0, 1, 2, 3], [0, 0, 0, 4, 5], [0, 0, 0, 0, 6]]
C[[1, 2, 3, 0, 0], [4, 5, 0, 0, 0], [6, 0, 0, 0, 0]]
D[[1, 2, 3, 0], [4, 5, 0], [6, 0, 0]]
Attempts:
2 left
💡 Hint

Check the padding='post' argument and maxlen=5.

Model Choice
advanced
2:00remaining
Choosing sequence length for RNN training

You have text sequences of varying lengths from 10 to 100 tokens. You want to train an RNN model efficiently. Which sequence length choice is best?

APad all sequences to length 100 to keep full information.
BPad sequences to the median length around 50 and truncate longer ones.
CPad all sequences to length 10 to speed up training.
DDo not pad sequences and train with variable lengths.
Attempts:
2 left
💡 Hint

Consider balancing information retention and training speed.

Metrics
advanced
1:30remaining
Effect of padding on model accuracy

When training a text classification model, how can excessive padding affect accuracy?

APadding causes the model to overfit the training data.
BPadding has no effect on accuracy since it is ignored by the model.
CMore padding always improves accuracy by standardizing input size.
DExcessive padding can introduce noise and reduce accuracy.
Attempts:
2 left
💡 Hint

Think about how meaningless zeros might affect learning.

🔧 Debug
expert
2:00remaining
Why does this padding code truncate unexpectedly?

Consider this code snippet:

from tensorflow.keras.preprocessing.sequence import pad_sequences
sequences = [[1, 2], [3, 4, 5, 6]]
padded = pad_sequences(sequences, maxlen=3, padding='post')
print(padded)

Why does this code truncate unexpectedly?

ABecause maxlen is smaller than the length of one sequence, causing truncation without specifying truncating='post'.
BBecause pad_sequences requires all sequences to be the same length before padding.
CBecause sequences contain integers instead of strings.
DBecause padding='post' is invalid and should be 'pre'.
Attempts:
2 left
💡 Hint

Check the maxlen parameter and sequence lengths.