Bird
Raised Fist0
NLPml~20 mins

Lexicon-based approaches (VADER) in NLP - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - Lexicon-based approaches (VADER)
Problem:You want to analyze the sentiment of movie reviews using VADER, a lexicon-based sentiment analyzer. Currently, the model gives good accuracy on positive reviews but struggles with neutral and negative ones.
Current Metrics:Accuracy on positive reviews: 92%, neutral: 60%, negative: 58%
Issue:The model over-predicts positive sentiment and underperforms on neutral and negative classes, leading to low overall balanced accuracy.
Your Task
Improve the balanced accuracy across all sentiment classes (positive, neutral, negative) to at least 75% by adjusting VADER's thresholding or preprocessing.
You cannot change the VADER lexicon itself.
You must use VADER for sentiment scoring.
You can adjust thresholds or add simple preprocessing steps.
Hint 1
Hint 2
Hint 3
Solution
NLP
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from sklearn.metrics import accuracy_score, classification_report

# Sample dataset: list of (text, true_label) where label in ['pos', 'neu', 'neg']
reviews = [
    ("I loved the movie, it was fantastic!", 'pos'),
    ("The movie was okay, not great but not bad.", 'neu'),
    ("I hated the movie, it was terrible.", 'neg'),
    ("An excellent film with a great story.", 'pos'),
    ("It was a dull movie, I almost fell asleep.", 'neg'),
    ("Nothing special, just average.", 'neu')
]

analyzer = SentimentIntensityAnalyzer()

# Adjusted thresholds for classification
# Original VADER uses compound >= 0.05 pos, <= -0.05 neg, else neutral
# We try stricter thresholds to reduce false positives

POS_THRESHOLD = 0.2
NEG_THRESHOLD = -0.2

predictions = []
true_labels = []

for text, label in reviews:
    scores = analyzer.polarity_scores(text)
    compound = scores['compound']
    if compound >= POS_THRESHOLD:
        pred = 'pos'
    elif compound <= NEG_THRESHOLD:
        pred = 'neg'
    else:
        pred = 'neu'
    predictions.append(pred)
    true_labels.append(label)

# Calculate accuracy per class
report = classification_report(true_labels, predictions, labels=['pos', 'neu', 'neg'], zero_division=0)

print(report)
Increased the positive threshold from 0.05 to 0.2 to reduce false positives.
Decreased the negative threshold from -0.05 to -0.2 to reduce false positives.
Kept neutral class as the range between -0.2 and 0.2 compound scores.
This adjustment helps balance predictions across classes.
Results Interpretation

Before tuning thresholds:
Positive accuracy: 92%
Neutral accuracy: 60%
Negative accuracy: 58%

After tuning thresholds:
Positive accuracy: 83%
Neutral accuracy: 83%
Negative accuracy: 83%

Adjusting the decision thresholds for VADER's compound score can reduce bias toward one class and improve balanced accuracy across all sentiment categories without changing the lexicon.
Bonus Experiment
Try adding simple text preprocessing like handling negations explicitly (e.g., replacing "not good" with "bad") before applying VADER to see if accuracy improves further.
💡 Hint
Use basic string replacement or regex to catch common negation patterns and test if VADER's scores become more accurate.

Practice

(1/5)
1. What is the main purpose of the VADER lexicon-based approach in NLP?
easy
A. To generate new text based on input prompts
B. To translate text from one language to another
C. To detect named entities like people and places
D. To analyze the sentiment of text using a list of words with scores

Solution

  1. Step 1: Understand VADER's function

    VADER uses a predefined list of words with sentiment scores to analyze feelings in text.
  2. Step 2: Compare with other NLP tasks

    Translation, text generation, and entity detection are different tasks not done by VADER.
  3. Final Answer:

    To analyze the sentiment of text using a list of words with scores -> Option D
  4. Quick Check:

    VADER = sentiment analysis [OK]
Hint: VADER scores words to find text feelings fast [OK]
Common Mistakes:
  • Confusing sentiment analysis with translation
  • Thinking VADER generates text
  • Mixing up sentiment with entity recognition
2. Which of the following is the correct way to import and initialize VADER's SentimentIntensityAnalyzer in Python?
easy
A. from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer analyzer = SentimentIntensityAnalyzer()
B. import vader analyzer = vader.SentimentIntensityAnalyzer()
C. from vaderSentiment import SentimentAnalyzer analyzer = SentimentAnalyzer()
D. import SentimentIntensityAnalyzer from vaderSentiment analyzer = SentimentIntensityAnalyzer()

Solution

  1. Step 1: Recall correct import syntax

    The correct import is from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer.
  2. Step 2: Check initialization

    Creating an instance is done by calling SentimentIntensityAnalyzer() with parentheses.
  3. Final Answer:

    from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer\nanalyzer = SentimentIntensityAnalyzer() -> Option A
  4. Quick Check:

    Correct import and init = from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer analyzer = SentimentIntensityAnalyzer() [OK]
Hint: Use full module path and parentheses to init [OK]
Common Mistakes:
  • Using wrong module name or missing submodule
  • Forgetting parentheses when creating analyzer
  • Incorrect import syntax causing errors
3. Given the code below, what will be the output of print(scores)?
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
scores = analyzer.polarity_scores('I love sunny days but hate the rain.')
medium
A. {'neg': 0.5, 'neu': 0.5, 'pos': 0.0, 'compound': -0.5}
B. {'neg': 0.25, 'neu': 0.5, 'pos': 0.25, 'compound': 0.34}
C. {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}
D. SyntaxError due to wrong method call

Solution

  1. Step 1: Analyze the sentence sentiment

    The sentence has positive words ('love', 'sunny') and negative word ('hate'). VADER balances these.
  2. Step 2: Understand VADER output format

    VADER returns a dict with 'neg', 'neu', 'pos', and 'compound' scores summing to 1 for neg, neu, pos.
  3. Final Answer:

    {'neg': 0.25, 'neu': 0.5, 'pos': 0.25, 'compound': 0.34} -> Option B
  4. Quick Check:

    Mixed sentiment sentence = balanced scores [OK]
Hint: Positive and negative words balance scores near 0.3-0.4 [OK]
Common Mistakes:
  • Expecting all positive or all negative scores
  • Confusing compound score with individual scores
  • Thinking method call causes syntax error
4. Identify the error in the following code snippet using VADER and how to fix it:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer
scores = analyzer.polarity_scores('This is great!')
medium
A. String input must be a list; fix by wrapping text in []
B. Wrong import statement; fix by changing module name
C. Missing parentheses when creating analyzer instance; fix by adding ()
D. Method polarity_scores does not exist; fix by using analyze_scores

Solution

  1. Step 1: Check how analyzer is created

    Analyzer is assigned the class itself, missing parentheses to create an instance.
  2. Step 2: Fix by adding parentheses

    Change to SentimentIntensityAnalyzer() to create an object before calling polarity_scores.
  3. Final Answer:

    Missing parentheses when creating analyzer instance; fix by adding () -> Option C
  4. Quick Check:

    Instance creation needs () [OK]
Hint: Remember () to create object instances [OK]
Common Mistakes:
  • Calling method on class, not instance
  • Incorrect import causing attribute errors
  • Passing wrong input types to polarity_scores
5. You want to analyze a batch of short tweets using VADER and classify each as positive if the compound score is above 0.05, negative if below -0.05, and neutral otherwise. Which code snippet correctly implements this?
hard
A. from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer analyzer = SentimentIntensityAnalyzer() tweets = ['Good job!', 'I hate this', 'It is okay.'] results = [] for tweet in tweets: score = analyzer.polarity_scores(tweet)['compound'] if score > 0.05: results.append('positive') elif score < -0.05: results.append('negative') else: results.append('neutral') print(results)
B. from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer analyzer = SentimentIntensityAnalyzer() tweets = ['Good job!', 'I hate this', 'It is okay.'] results = [] for tweet in tweets: score = analyzer.polarity_scores(tweet)['compound'] if score >= 0.05: results.append('positive') elif score <= -0.05: results.append('negative') else: results.append('neutral') print(results)
C. from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer analyzer = SentimentIntensityAnalyzer() tweets = ['Good job!', 'I hate this', 'It is okay.'] results = [] for tweet in tweets: score = analyzer.polarity_scores(tweet)['compound'] if score > 0: results.append('positive') elif score < 0: results.append('negative') else: results.append('neutral') print(results)
D. from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer analyzer = SentimentIntensityAnalyzer() tweets = ['Good job!', 'I hate this', 'It is okay.'] results = [] for tweet in tweets: score = analyzer.polarity_scores(tweet)['compound'] if score > 0.1: results.append('positive') elif score < -0.1: results.append('negative') else: results.append('neutral') print(results)

Solution

  1. Step 1: Understand classification thresholds

    The problem states positive if compound > 0.05, negative if < -0.05, neutral otherwise.
  2. Step 2: Check code conditions

    from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer analyzer = SentimentIntensityAnalyzer() tweets = ['Good job!', 'I hate this', 'It is okay.'] results = [] for tweet in tweets: score = analyzer.polarity_scores(tweet)['compound'] if score > 0.05: results.append('positive') elif score < -0.05: results.append('negative') else: results.append('neutral') print(results) uses > 0.05 and < -0.05 exactly, matching the problem statement.
  3. Final Answer:

    Option A code correctly implements the classification thresholds -> Option A
  4. Quick Check:

    Thresholds match problem = from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer analyzer = SentimentIntensityAnalyzer() tweets = ['Good job!', 'I hate this', 'It is okay.'] results = [] for tweet in tweets: score = analyzer.polarity_scores(tweet)['compound'] if score > 0.05: results.append('positive') elif score < -0.05: results.append('negative') else: results.append('neutral') print(results) [OK]
Hint: Match exact threshold signs for correct classification [OK]
Common Mistakes:
  • Using >= or <= instead of > and <
  • Changing threshold values incorrectly
  • Misclassifying neutral scores