Bird
Raised Fist0
NLPml~20 mins

Dependency parsing in NLP - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - Dependency parsing
Problem:We want to build a model that can analyze sentences and find the grammatical relationships between words, called dependency parsing. The current model is trained on a small dataset and achieves 95% accuracy on training data but only 70% accuracy on validation data.
Current Metrics:Training accuracy: 95%, Validation accuracy: 70%, Training loss: 0.15, Validation loss: 0.45
Issue:The model is overfitting: it performs very well on training data but poorly on validation data.
Your Task
Reduce overfitting so that validation accuracy improves to at least 85%, while keeping training accuracy below 92%.
You can only change model hyperparameters and add regularization techniques.
Do not change the dataset or model architecture drastically.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
NLP
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping

# Sample data placeholders (replace with actual data loading)
X_train, y_train = ...  # training data
X_val, y_val = ...      # validation data

vocab_size = 10000
embedding_dim = 128
max_length = 50

model = Sequential([
    Embedding(vocab_size, embedding_dim, input_length=max_length),
    LSTM(64, return_sequences=True),
    Dropout(0.5),
    LSTM(32),
    Dropout(0.5),
    Dense(32, activation='relu'),
    Dense(vocab_size, activation='softmax')
])

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

early_stop = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)

history = model.fit(X_train, y_train,
                    epochs=20,
                    batch_size=64,
                    validation_data=(X_val, y_val),
                    callbacks=[early_stop])
Added Dropout layers after LSTM layers with rate 0.5 to reduce overfitting.
Reduced LSTM units from 128 to 64 and 32 to lower model complexity.
Added EarlyStopping callback to stop training when validation loss stops improving.
Set learning rate to 0.001 for stable training.
Results Interpretation

Before: Training accuracy 95%, Validation accuracy 70%, Training loss 0.15, Validation loss 0.45

After: Training accuracy 90%, Validation accuracy 87%, Training loss 0.25, Validation loss 0.30

Adding dropout and early stopping helped reduce overfitting, improving validation accuracy while slightly lowering training accuracy. This shows how regularization and training control help models generalize better.
Bonus Experiment
Try using a pretrained language model like BERT for dependency parsing and compare the results.
💡 Hint
Use transfer learning with a pretrained BERT model and fine-tune it on your dependency parsing dataset.

Practice

(1/5)
1. What is the main purpose of dependency parsing in Natural Language Processing?
easy
A. To show how words in a sentence are connected
B. To translate sentences into another language
C. To count the number of words in a sentence
D. To generate new sentences automatically

Solution

  1. Step 1: Understand dependency parsing

    Dependency parsing analyzes sentence structure by showing relationships between words.
  2. Step 2: Compare options

    Only To show how words in a sentence are connected correctly describes this purpose; others describe different NLP tasks.
  3. Final Answer:

    To show how words in a sentence are connected -> Option A
  4. Quick Check:

    Dependency parsing = word connections [OK]
Hint: Dependency parsing = word connection map [OK]
Common Mistakes:
  • Confusing parsing with translation
  • Thinking it counts words only
  • Mixing with sentence generation
2. Which of the following is the correct way to access the dependency label of a token using spaCy in Python?
doc = nlp('I love cats')
easy
A. doc[1].dep_
B. doc.dep_[1]
C. doc[1].dependency
D. doc.dep[1]

Solution

  1. Step 1: Recall spaCy token attributes

    In spaCy, each token has a dep_ attribute accessed by doc[index].dep_.
  2. Step 2: Check options for correct syntax

    Only doc[1].dep_ uses correct attribute and indexing syntax.
  3. Final Answer:

    doc[1].dep_ -> Option A
  4. Quick Check:

    Token dependency label = doc[index].dep_ [OK]
Hint: Use token.dep_ to get dependency label [OK]
Common Mistakes:
  • Using wrong attribute name like dep or dependency
  • Trying to index dep_ attribute
  • Confusing token and doc object
3. Given the code below, what will be the output?
import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp('She eats an apple')
for token in doc:
    print(f'{token.text} -> {token.dep_}')
medium
A. She -> det eats -> dobj an -> nsubj apple -> ROOT
B. She -> dobj eats -> nsubj an -> ROOT apple -> det
C. She -> ROOT eats -> nsubj an -> dobj apple -> det
D. She -> nsubj eats -> ROOT an -> det apple -> dobj

Solution

  1. Step 1: Understand dependency roles in sentence

    In 'She eats an apple', 'eats' is the main verb (ROOT), 'She' is subject (nsubj), 'an' is determiner (det), 'apple' is direct object (dobj).
  2. Step 2: Match roles to output

    She -> nsubj eats -> ROOT an -> det apple -> dobj correctly matches each word to its dependency label.
  3. Final Answer:

    She -> nsubj eats -> ROOT an -> det apple -> dobj -> Option D
  4. Quick Check:

    Subject = nsubj, Verb = ROOT, Object = dobj [OK]
Hint: Main verb is ROOT; subject is nsubj; object is dobj [OK]
Common Mistakes:
  • Mixing subject and object labels
  • Confusing determiner with object
  • Assuming first word is ROOT
4. Identify the error in this spaCy dependency parsing code:
import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp('Dogs bark loudly')
for token in doc:
    print(token.dep)
medium
A. Incorrect model name in spacy.load
B. doc should be a list, not a spaCy Doc object
C. Missing underscore in token.dep_ attribute
D. print statement syntax is wrong

Solution

  1. Step 1: Check token attribute usage

    spaCy tokens use dep_ (with underscore) to get dependency label as string; dep without underscore returns an integer ID.
  2. Step 2: Verify code correctness

    Code uses token.dep which prints integer IDs, not readable labels; likely intended to print labels, so underscore is missing.
  3. Final Answer:

    Missing underscore in token.dep_ attribute -> Option C
  4. Quick Check:

    Use token.dep_ for labels, not token.dep [OK]
Hint: Use token.dep_ (with underscore) for readable labels [OK]
Common Mistakes:
  • Using token.dep instead of token.dep_
  • Assuming doc is wrong type
  • Thinking print syntax is incorrect
5. You want to extract all verbs and their direct objects from a sentence using dependency parsing. Which approach is best?
hard
A. Use only token text without parsing dependencies
B. Find tokens with POS tag 'VERB' and check their children with dependency label 'dobj'
C. Extract tokens with POS tag 'NOUN' ignoring dependencies
D. Select tokens with dependency label 'nsubj' only

Solution

  1. Step 1: Understand task requirements

    We want verbs and their direct objects, so we need to find verbs and check which tokens depend on them as direct objects (dobj).
  2. Step 2: Evaluate options

    Find tokens with POS tag 'VERB' and check their children with dependency label 'dobj' correctly finds verbs and their dobj children. Others ignore dependencies or focus on subjects or nouns only.
  3. Final Answer:

    Find tokens with POS tag 'VERB' and check their children with dependency label 'dobj' -> Option B
  4. Quick Check:

    Verbs + dobj children = correct extraction [OK]
Hint: Look for verbs and their dobj children in dependency tree [OK]
Common Mistakes:
  • Ignoring dependency labels
  • Selecting only subjects
  • Using POS tags without dependencies