0
0
NLPml~20 mins

T5 for text-to-text tasks in NLP - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
T5 Text-to-Text Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
What is the main advantage of T5's text-to-text framework?
T5 treats every problem as a text-to-text task. What is the main advantage of this approach?
AIt allows using the same model architecture and training procedure for many different NLP tasks.
BIt requires separate models for each task to improve accuracy.
CIt only works for translation tasks and not for classification.
DIt eliminates the need for any pre-training on large datasets.
Attempts:
2 left
💡 Hint
Think about how treating all tasks as text input and output simplifies the model design.
Predict Output
intermediate
2:00remaining
Output of T5 model generating summary
Given the input text: "summarize: The cat sat on the mat and looked outside." What is the most likely output of a T5 model fine-tuned for summarization?
NLP
from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained('t5-small')
model = T5ForConditionalGeneration.from_pretrained('t5-small')

input_text = 'summarize: The cat sat on the mat and looked outside.'
input_ids = tokenizer(input_text, return_tensors='pt').input_ids

outputs = model.generate(input_ids, max_length=10)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
A"The mat was sat on by the cat."
B"The cat sat on the mat."
C"The cat sat on the mat and looked outside."
D"The cat looked outside."
Attempts:
2 left
💡 Hint
Summarization usually shortens and keeps the main idea.
Hyperparameter
advanced
2:00remaining
Choosing max_length for T5 text generation
When generating text with T5, which max_length value is best to avoid cutting off important output while keeping generation efficient?
ASet max_length to a very large number like 1000 to ensure full output.
BSet max_length to a small number like 5 to speed up generation.
CSet max_length to a reasonable length based on expected output size, e.g., 50 for summaries.
DDo not set max_length; let the model generate indefinitely.
Attempts:
2 left
💡 Hint
Think about balancing output completeness and speed.
Metrics
advanced
2:00remaining
Evaluating T5 model performance on summarization
Which metric is most appropriate to evaluate the quality of summaries generated by a T5 model?
ABLEU score - measures overlap of n-grams between generated and reference text.
BAccuracy - percentage of exact matches with reference summaries.
CMean Squared Error - difference between predicted and actual token IDs.
DPerplexity - measures how well the model predicts the next token.
Attempts:
2 left
💡 Hint
Think about metrics that compare generated text to reference text.
🔧 Debug
expert
3:00remaining
Why does T5 generate repetitive text?
You fine-tuned a T5 model for text generation, but the output repeats the same phrase multiple times. What is the most likely cause?
AThe model's max_length is too short, causing repetition.
BThe decoding strategy lacks diversity, e.g., greedy decoding without penalties.
CThe beam search size is too large, causing repeated phrases.
DThe tokenizer vocabulary is too small, causing repetition.
Attempts:
2 left
💡 Hint
Consider how decoding methods affect output variety.