Bird
Raised Fist0
Prompt Engineering / GenAIml~10 mins

Hugging Face fine-tuning in Prompt Engineering / GenAI - Interactive Code Practice

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to load a pretrained model from Hugging Face.

Prompt Engineering / GenAI
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained('[1]')
Drag options to blanks, or click blank then click option'
Ascikit-learn
Bpytorch-lightning
Cbert-base-uncased
Dtensorflow
Attempts:
3 left
💡 Hint
Common Mistakes
Using framework names like 'tensorflow' instead of model names.
Confusing model names with unrelated libraries.
2fill in blank
medium

Complete the code to prepare the tokenizer for the model.

Prompt Engineering / GenAI
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('[1]')
Drag options to blanks, or click blank then click option'
Agpt2
Broberta-base
Cdistilbert-base-uncased
Dbert-base-uncased
Attempts:
3 left
💡 Hint
Common Mistakes
Using a tokenizer from a different model family.
Using a tokenizer that does not match the model.
3fill in blank
hard

Fix the error in the training loop by completing the missing optimizer step.

Prompt Engineering / GenAI
for batch in train_dataloader:
    outputs = model(**batch)
    loss = outputs.loss
    loss.backward()
    [1]
    optimizer.zero_grad()
Drag options to blanks, or click blank then click option'
Aoptimizer.step()
Bloss.step()
Cmodel.step()
Dloss.backward()
Attempts:
3 left
💡 Hint
Common Mistakes
Calling loss.backward() twice.
Calling step() on loss or model instead of optimizer.
4fill in blank
hard

Fill both blanks to create a dictionary comprehension that maps words to their lengths only if length is greater than 3.

Prompt Engineering / GenAI
{word: [1] for word in words if len(word) [2] 3}
Drag options to blanks, or click blank then click option'
Alen(word)
B<
C>
Dword
Attempts:
3 left
💡 Hint
Common Mistakes
Using the word itself as value instead of its length.
Using '<' instead of '>' in the condition.
5fill in blank
hard

Fill all three blanks to create a filtered dictionary with uppercase keys, values as counts, and only include counts greater than 1.

Prompt Engineering / GenAI
filtered_counts = { [1]: [2] for [3], count in counts.items() if count > 1 }
Drag options to blanks, or click blank then click option'
Aword.upper()
Bcount
Cword
Dcount.upper()
Attempts:
3 left
💡 Hint
Common Mistakes
Using count.upper() which is invalid for integers.
Swapping keys and values in the dictionary comprehension.

Practice

(1/5)
1. What is the main purpose of fine-tuning a pre-trained model using Hugging Face?
easy
A. To adapt the model to perform well on a specific new task
B. To train a model from scratch without any prior knowledge
C. To reduce the size of the model for faster inference
D. To convert the model into a different programming language

Solution

  1. Step 1: Understand what fine-tuning means

    Fine-tuning means taking a model already trained on a large dataset and adjusting it to work well on a new, specific task.
  2. Step 2: Identify the purpose in Hugging Face context

    Hugging Face fine-tuning adapts the pre-trained model's knowledge to your task, improving accuracy without training from scratch.
  3. Final Answer:

    To adapt the model to perform well on a specific new task -> Option A
  4. Quick Check:

    Fine-tuning = adapt model to new task [OK]
Hint: Fine-tuning means adjusting a model for your task [OK]
Common Mistakes:
  • Thinking fine-tuning trains a model from scratch
  • Confusing fine-tuning with model compression
  • Assuming fine-tuning changes the programming language
2. Which of the following is the correct way to create a TrainingArguments object in Hugging Face?
easy
A. training_args = TrainArgs(directory='output', epochs=3)
B. training_args = TrainerArguments(output='output', epochs=3)
C. training_args = Training(output_dir='output', epochs=3)
D. training_args = TrainingArguments(output_dir='output', num_train_epochs=3)

Solution

  1. Step 1: Recall the correct class name and parameters

    The Hugging Face library uses the class TrainingArguments with parameters like output_dir and num_train_epochs.
  2. Step 2: Match the correct syntax

    training_args = TrainingArguments(output_dir='output', num_train_epochs=3) uses the correct class name and parameter names exactly as in the Hugging Face API.
  3. Final Answer:

    training_args = TrainingArguments(output_dir='output', num_train_epochs=3) -> Option D
  4. Quick Check:

    TrainingArguments with output_dir and num_train_epochs [OK]
Hint: Use TrainingArguments with output_dir and num_train_epochs [OK]
Common Mistakes:
  • Using wrong class names like TrainerArguments or TrainArgs
  • Using incorrect parameter names like epochs instead of num_train_epochs
  • Confusing Trainer and TrainingArguments classes
3. Given the code snippet below, what will be the output of print(len(tokenized_datasets['train'][0]['input_ids']))?
from datasets import load_dataset
from transformers import AutoTokenizer

dataset = load_dataset('imdb', split='train[:1%]')
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
tokenized_datasets = dataset.map(lambda x: tokenizer(x['text'], truncation=True, padding='max_length', max_length=128))
medium
A. None, it will raise an error
B. 128
C. 512
D. variable length depending on text

Solution

  1. Step 1: Understand tokenizer parameters

    The tokenizer is called with padding='max_length' and max_length=128, so all sequences are padded or truncated to length 128.
  2. Step 2: Check the length of input_ids

    Since padding to max_length is applied, each tokenized input's input_ids list length is exactly 128.
  3. Final Answer:

    128 -> Option B
  4. Quick Check:

    Padding to max_length = fixed length 128 [OK]
Hint: Padding with max_length fixes token length [OK]
Common Mistakes:
  • Assuming variable length without padding
  • Confusing max_length with 512 default
  • Expecting error due to missing batch=True
4. You wrote this code to fine-tune a model but get an error: TypeError: Trainer() missing 1 required positional argument: 'model'. What is the likely fix?
medium
A. Change Trainer to TrainingArguments
B. Remove the 'model' argument from Trainer initialization
C. Pass the pre-trained model as the 'model' argument when creating Trainer
D. Call Trainer.train() before creating the Trainer object

Solution

  1. Step 1: Understand the error message

    The error says the Trainer constructor needs a 'model' argument but it was not provided.
  2. Step 2: Fix by providing the model

    When creating a Trainer, you must pass the pre-trained model as the 'model' parameter to avoid this error.
  3. Final Answer:

    Pass the pre-trained model as the 'model' argument when creating Trainer -> Option C
  4. Quick Check:

    Trainer requires model argument [OK]
Hint: Always pass model to Trainer constructor [OK]
Common Mistakes:
  • Forgetting to pass model to Trainer
  • Confusing Trainer with TrainingArguments
  • Calling train() before creating Trainer
5. You want to fine-tune a Hugging Face model on a small dataset but avoid overfitting. Which combination of TrainingArguments settings is best?
hard
A. Set num_train_epochs=3 and use evaluation_strategy='steps' with early stopping
B. Set num_train_epochs=10 and learning_rate=5e-5
C. Set batch_size=1 and disable evaluation
D. Set num_train_epochs=1 and learning_rate=1.0

Solution

  1. Step 1: Identify overfitting prevention methods

    Using fewer epochs and evaluation with early stopping helps stop training before overfitting.
  2. Step 2: Evaluate options for best practice

    Set num_train_epochs=3 and use evaluation_strategy='steps' with early stopping sets a moderate number of epochs and enables evaluation with early stopping, which is best to avoid overfitting.
  3. Final Answer:

    Set num_train_epochs=3 and use evaluation_strategy='steps' with early stopping -> Option A
  4. Quick Check:

    Early stopping + moderate epochs prevent overfitting [OK]
Hint: Use early stopping and moderate epochs to avoid overfitting [OK]
Common Mistakes:
  • Using too many epochs causing overfitting
  • Setting learning rate too high or too low
  • Ignoring evaluation and early stopping