Practice

(1/5)

1. What is the main purpose of fine-tuning a pre-trained model using Hugging Face?

easy

A. To adapt the model to perform well on a specific new task

B. To train a model from scratch without any prior knowledge

C. To reduce the size of the model for faster inference

D. To convert the model into a different programming language

Solution

Step 1: Understand what fine-tuning means
Fine-tuning means taking a model already trained on a large dataset and adjusting it to work well on a new, specific task.
Step 2: Identify the purpose in Hugging Face context
Hugging Face fine-tuning adapts the pre-trained model's knowledge to your task, improving accuracy without training from scratch.
Final Answer:
To adapt the model to perform well on a specific new task -> Option A
Quick Check:
Fine-tuning = adapt model to new task [OK]

Hint: Fine-tuning means adjusting a model for your task [OK]

Common Mistakes:

Thinking fine-tuning trains a model from scratch
Confusing fine-tuning with model compression
Assuming fine-tuning changes the programming language

2. Which of the following is the correct way to create a TrainingArguments object in Hugging Face?

easy

A. training_args = TrainArgs(directory='output', epochs=3)

B. training_args = TrainerArguments(output='output', epochs=3)

C. training_args = Training(output_dir='output', epochs=3)

D. training_args = TrainingArguments(output_dir='output', num_train_epochs=3)

Solution

Step 1: Recall the correct class name and parameters
The Hugging Face library uses the class TrainingArguments with parameters like output_dir and num_train_epochs.
Step 2: Match the correct syntax
training_args = TrainingArguments(output_dir='output', num_train_epochs=3) uses the correct class name and parameter names exactly as in the Hugging Face API.
Final Answer:
training_args = TrainingArguments(output_dir='output', num_train_epochs=3) -> Option D
Quick Check:
TrainingArguments with output_dir and num_train_epochs [OK]

Hint: Use TrainingArguments with output_dir and num_train_epochs [OK]

Common Mistakes:

Using wrong class names like TrainerArguments or TrainArgs
Using incorrect parameter names like epochs instead of num_train_epochs
Confusing Trainer and TrainingArguments classes

3. Given the code snippet below, what will be the output of print(len(tokenized_datasets['train'][0]['input_ids']))?

from datasets import load_dataset
from transformers import AutoTokenizer

dataset = load_dataset('imdb', split='train[:1%]')
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
tokenized_datasets = dataset.map(lambda x: tokenizer(x['text'], truncation=True, padding='max_length', max_length=128))

medium

A. None, it will raise an error

B. 128

C. 512

D. variable length depending on text

Solution

Step 1: Understand tokenizer parameters
The tokenizer is called with padding='max_length' and max_length=128, so all sequences are padded or truncated to length 128.
Step 2: Check the length of input_ids
Since padding to max_length is applied, each tokenized input's input_ids list length is exactly 128.
Final Answer:
128 -> Option B
Quick Check:
Padding to max_length = fixed length 128 [OK]

Hint: Padding with max_length fixes token length [OK]

Common Mistakes:

Assuming variable length without padding
Confusing max_length with 512 default
Expecting error due to missing batch=True

4. You wrote this code to fine-tune a model but get an error: TypeError: Trainer() missing 1 required positional argument: 'model'. What is the likely fix?

medium

A. Change Trainer to TrainingArguments

B. Remove the 'model' argument from Trainer initialization

C. Pass the pre-trained model as the 'model' argument when creating Trainer

D. Call Trainer.train() before creating the Trainer object

Solution

Step 1: Understand the error message
The error says the Trainer constructor needs a 'model' argument but it was not provided.
Step 2: Fix by providing the model
When creating a Trainer, you must pass the pre-trained model as the 'model' parameter to avoid this error.
Final Answer:
Pass the pre-trained model as the 'model' argument when creating Trainer -> Option C
Quick Check:
Trainer requires model argument [OK]

Hint: Always pass model to Trainer constructor [OK]

Common Mistakes:

Forgetting to pass model to Trainer
Confusing Trainer with TrainingArguments
Calling train() before creating Trainer

5. You want to fine-tune a Hugging Face model on a small dataset but avoid overfitting. Which combination of TrainingArguments settings is best?

hard

A. Set num_train_epochs=3 and use evaluation_strategy='steps' with early stopping

B. Set num_train_epochs=10 and learning_rate=5e-5

C. Set batch_size=1 and disable evaluation

D. Set num_train_epochs=1 and learning_rate=1.0

Solution

Step 1: Identify overfitting prevention methods
Using fewer epochs and evaluation with early stopping helps stop training before overfitting.
Step 2: Evaluate options for best practice
Set num_train_epochs=3 and use evaluation_strategy='steps' with early stopping sets a moderate number of epochs and enables evaluation with early stopping, which is best to avoid overfitting.
Final Answer:
Set num_train_epochs=3 and use evaluation_strategy='steps' with early stopping -> Option A
Quick Check:
Early stopping + moderate epochs prevent overfitting [OK]

Hint: Use early stopping and moderate epochs to avoid overfitting [OK]

Common Mistakes:

Using too many epochs causing overfitting
Setting learning rate too high or too low
Ignoring evaluation and early stopping

Epoch	Loss ↓	Accuracy ↑	Observation
1	0.65	0.70	Model starts learning, loss decreases from initial high value
2	0.42	0.82	Loss decreases further, accuracy improves significantly
3	0.30	0.88	Model converges with low loss and high accuracy

Hugging Face fine-tuning in Prompt Engineering / GenAI - Model Pipeline Trace

Start learning this pattern below

Practice

Solution

Step 1: Understand what fine-tuning means

Step 2: Identify the purpose in Hugging Face context

Final Answer:

Quick Check:

Solution

Step 1: Recall the correct class name and parameters

Step 2: Match the correct syntax

Final Answer:

Quick Check:

Solution

Step 1: Understand tokenizer parameters

Step 2: Check the length of input_ids

Final Answer:

Quick Check:

Solution

Step 1: Understand the error message

Step 2: Fix by providing the model

Final Answer:

Quick Check:

Solution

Step 1: Identify overfitting prevention methods

Step 2: Evaluate options for best practice

Final Answer:

Quick Check: