Bird
Raised Fist0
NLPml~20 mins

Sentiment analysis pipeline in NLP - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - Sentiment analysis pipeline
Problem:Build a sentiment analysis model to classify movie reviews as positive or negative.
Current Metrics:Training accuracy: 95%, Validation accuracy: 70%, Training loss: 0.15, Validation loss: 0.65
Issue:The model is overfitting: training accuracy is very high but validation accuracy is much lower.
Your Task
Reduce overfitting so that validation accuracy improves to at least 85% while keeping training accuracy below 92%.
You can only modify the model architecture and training hyperparameters.
Do not change the dataset or preprocessing steps.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
NLP
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.datasets import imdb

# Load data
max_features = 10000
max_len = 200
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=max_features)

X_train = pad_sequences(X_train, maxlen=max_len)
X_test = pad_sequences(X_test, maxlen=max_len)

# Build model with dropout and reduced units
model = Sequential([
    Embedding(max_features, 64, input_length=max_len),
    LSTM(64, return_sequences=False),
    Dropout(0.5),
    Dense(1, activation='sigmoid')
])

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
              loss='binary_crossentropy',
              metrics=['accuracy'])

# Early stopping callback
early_stop = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)

# Train model
history = model.fit(X_train, y_train, epochs=20, batch_size=64, validation_split=0.2, callbacks=[early_stop])

# Evaluate
train_loss, train_acc = model.evaluate(X_train, y_train, verbose=0)
val_loss, val_acc = model.evaluate(X_test, y_test, verbose=0)

print(f'Training accuracy: {train_acc*100:.2f}%, Validation accuracy: {val_acc*100:.2f}%')
Added a Dropout layer with rate 0.5 after the LSTM layer to reduce overfitting.
Reduced LSTM units from 128 to 64 to simplify the model.
Lowered learning rate to 0.0001 for more stable training.
Added early stopping to stop training when validation loss stops improving.
Results Interpretation

Before: Training accuracy 95%, Validation accuracy 70%, high overfitting.

After: Training accuracy 90%, Validation accuracy 87%, overfitting reduced.

Adding dropout, reducing model size, lowering learning rate, and using early stopping helps reduce overfitting and improves validation accuracy.
Bonus Experiment
Try using a pretrained word embedding like GloVe instead of training embeddings from scratch.
💡 Hint
Load GloVe embeddings and set them as weights in the Embedding layer with trainable=False.

Practice

(1/5)
1. What is the main purpose of a sentiment analysis pipeline in natural language processing?
easy
A. To automatically detect feelings or opinions in text
B. To translate text from one language to another
C. To count the number of words in a sentence
D. To generate new text based on input

Solution

  1. Step 1: Understand the goal of sentiment analysis

    Sentiment analysis is about finding emotions or opinions in text data.
  2. Step 2: Identify the pipeline's role

    A sentiment analysis pipeline automates this process to detect feelings like positive or negative.
  3. Final Answer:

    To automatically detect feelings or opinions in text -> Option A
  4. Quick Check:

    Sentiment analysis = detect feelings [OK]
Hint: Sentiment analysis finds emotions in text fast [OK]
Common Mistakes:
  • Confusing sentiment analysis with translation
  • Thinking it counts words instead of feelings
  • Assuming it generates new text
2. Which of the following is the correct way to create a sentiment analysis pipeline using the Hugging Face Transformers library in Python?
easy
A. pipeline = Pipeline('text-classification')
B. pipeline = create_pipeline('sentiment')
C. pipeline = sentiment_pipeline()
D. pipeline = pipeline('sentiment-analysis')

Solution

  1. Step 1: Recall the Hugging Face pipeline syntax

    The correct function is pipeline with the task name as a string.
  2. Step 2: Match the exact task name for sentiment analysis

    The task name is 'sentiment-analysis', so pipeline('sentiment-analysis') is correct.
  3. Final Answer:

    pipeline = pipeline('sentiment-analysis') -> Option D
  4. Quick Check:

    Use pipeline('sentiment-analysis') to create sentiment pipeline [OK]
Hint: Use pipeline('sentiment-analysis') exactly [OK]
Common Mistakes:
  • Using wrong function names like create_pipeline
  • Missing quotes around task name
  • Using incorrect task names like 'sentiment'
3. What will be the output of this Python code using Hugging Face's sentiment analysis pipeline?
from transformers import pipeline
sentiment = pipeline('sentiment-analysis')
result = sentiment('I love sunny days!')
print(result)
medium
A. [{'label': 'NEGATIVE', 'score': 0.99}]
B. [{'label': 'POSITIVE', 'score': 0.99}]
C. SyntaxError
D. []

Solution

  1. Step 1: Understand the input text sentiment

    The sentence 'I love sunny days!' expresses a positive feeling.
  2. Step 2: Predict output from sentiment pipeline

    The pipeline returns a list with a dictionary containing label 'POSITIVE' and a high confidence score.
  3. Final Answer:

    [{'label': 'POSITIVE', 'score': 0.99}] -> Option B
  4. Quick Check:

    Positive sentence = POSITIVE label [OK]
Hint: Positive words give POSITIVE label with high score [OK]
Common Mistakes:
  • Expecting NEGATIVE label for positive text
  • Thinking output is a string, not a list of dict
  • Confusing syntax errors with runtime output
4. You wrote this code but get an error: NameError: name 'pipeline' is not defined. What is the likely fix?
sentiment = pipeline('sentiment-analysis')
result = sentiment('I hate rain.')
print(result)
medium
A. Add from transformers import pipeline before using pipeline
B. Change 'sentiment-analysis' to 'sentiment'
C. Replace pipeline with sentiment_pipeline
D. Remove parentheses from pipeline call

Solution

  1. Step 1: Identify cause of NameError

    The error means Python does not know what pipeline is because it was not imported.
  2. Step 2: Fix by importing pipeline function

    Adding from transformers import pipeline defines pipeline so the code runs correctly.
  3. Final Answer:

    Add from transformers import pipeline before using pipeline -> Option A
  4. Quick Check:

    Import missing = NameError fixed [OK]
Hint: Always import pipeline before using it [OK]
Common Mistakes:
  • Changing task name instead of importing
  • Assuming pipeline is built-in without import
  • Removing parentheses causing syntax errors
5. You want to analyze customer reviews but some reviews are empty strings or just spaces. How should you modify your sentiment analysis pipeline to handle this before prediction?
hard
A. Replace empty reviews with the word 'neutral' and analyze
B. Pass all reviews directly to the pipeline without changes
C. Filter out empty or whitespace-only reviews before passing to the pipeline
D. Use a different pipeline for empty reviews

Solution

  1. Step 1: Understand the problem with empty inputs

    Empty or whitespace-only texts do not contain sentiment and can cause errors or meaningless results.
  2. Step 2: Apply filtering before analysis

    Removing or skipping these empty reviews ensures the pipeline only processes valid text, improving accuracy and avoiding errors.
  3. Final Answer:

    Filter out empty or whitespace-only reviews before passing to the pipeline -> Option C
  4. Quick Check:

    Remove empty inputs before analysis [OK]
Hint: Skip empty reviews to avoid errors [OK]
Common Mistakes:
  • Passing empty strings causing errors
  • Replacing empty with unrelated words
  • Using multiple pipelines unnecessarily