Practice - 5 Tasks
Answer the questions below
1fill in blank
easyComplete the code to load the multilingual sentiment dataset using Hugging Face datasets.
NLP
from datasets import load_dataset dataset = load_dataset([1], 'multilingual')
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Choosing 'imdb' which is English only.
Using 'glue' which is for English language understanding tasks.
Selecting 'squad' which is for question answering.
✗ Incorrect
The 'amazon_reviews_multi' dataset contains multilingual sentiment reviews suitable for sentiment analysis tasks.
2fill in blank
mediumComplete the code to tokenize the input text for multilingual sentiment analysis using a pretrained tokenizer.
NLP
from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained([1]) tokens = tokenizer(text, padding=True, truncation=True, return_tensors='pt')
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'bert-base-uncased' which is English only.
Choosing 'distilbert-base-uncased' which is English only.
Selecting 'roberta-base' which is English only.
✗ Incorrect
The 'xlm-roberta-base' tokenizer supports multiple languages, making it suitable for multilingual sentiment tasks.
3fill in blank
hardFix the error in the model loading code for multilingual sentiment classification.
NLP
from transformers import AutoModelForSequenceClassification model = AutoModelForSequenceClassification.from_pretrained([1], num_labels=3)
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'bert-base-uncased' which is English only.
Choosing 'xlm-roberta-base' without fine-tuning for classification.
Selecting 'distilbert-base-uncased' which is English only.
✗ Incorrect
The 'bert-base-multilingual-cased' model supports multiple languages and can be used for sequence classification with specified labels.
4fill in blank
hardFill both blanks to create a dictionary comprehension that maps each review text to its sentiment label.
NLP
sentiment_dict = { [1]: [2] for example in dataset['train'] } Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Choosing 'text' which does not exist in the dataset.
Using 'sentiment' which does not exist in this dataset.
✗ Incorrect
The dataset uses 'review_body' for review content and 'star_rating' for sentiment (1-5 stars), so the dictionary maps text to star rating.
5fill in blank
hardFill all three blanks to compute accuracy of the model predictions.
NLP
correct = sum(1 for pred, true in zip(predictions, labels) if pred [1] true) accuracy = correct [2] len(labels) print(f'Accuracy: {accuracy:.2f}')
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using '!=' instead of '==' causing wrong accuracy calculation.
Multiplying instead of dividing for accuracy.
✗ Incorrect
To compute accuracy, count where prediction equals true label, then divide by total labels.