Bidirectional LSTM models are often used for tasks like text classification, named entity recognition, or sentiment analysis. The key metrics to check are accuracy for overall correctness, precision and recall to understand how well the model finds relevant items and avoids mistakes, and F1 score to balance precision and recall. These metrics help us know if the model understands the sequence data well from both directions.
Bidirectional LSTM in NLP - Model Metrics & Evaluation
Start learning this pattern below
Jump into concepts and practice - no test required
Actual \ Predicted | Positive | Negative
-------------------|----------|---------
Positive | 80 | 20
Negative | 10 | 90
Here, True Positives (TP) = 80, False Negatives (FN) = 20, False Positives (FP) = 10, True Negatives (TN) = 90. Total samples = 200.
Precision = 80 / (80 + 10) = 0.89
Recall = 80 / (80 + 20) = 0.80
F1 Score = 2 * (0.89 * 0.80) / (0.89 + 0.80) ≈ 0.84
Imagine a Bidirectional LSTM used for spam detection in emails:
- High Precision: The model marks emails as spam only when very sure. This means fewer good emails are wrongly marked as spam (low false positives).
- High Recall: The model catches almost all spam emails, but might mark some good emails as spam (higher false positives).
Depending on what matters more (missing spam or wrongly blocking good emails), you choose to optimize precision or recall.
Good: Accuracy above 85%, Precision and Recall both above 80%, and F1 score balanced near 80% or higher. This means the model correctly understands sequences from both directions and makes reliable predictions.
Bad: Accuracy near 50-60%, Precision or Recall very low (below 50%), or large difference between precision and recall. This shows the model struggles to learn meaningful patterns or is biased.
- Accuracy Paradox: High accuracy but poor recall or precision, especially with imbalanced classes.
- Data Leakage: Training data accidentally includes future information, inflating metrics.
- Overfitting: Very high training accuracy but low test accuracy means the model memorizes instead of generalizing.
Your Bidirectional LSTM model has 98% accuracy but only 12% recall on the positive class (e.g., fraud detection). Is it good for production?
Answer: No, because the model misses most positive cases (low recall). Even with high accuracy, it fails to find important examples. For tasks like fraud detection, high recall is critical to catch as many frauds as possible.
Practice
Bidirectional LSTM compared to a standard LSTM?Solution
Step 1: Understand LSTM directionality
A standard LSTM reads the input sequence only in the forward direction, from start to end.Step 2: Analyze Bidirectional LSTM behavior
A Bidirectional LSTM reads the sequence both forward and backward, capturing information from past and future context.Final Answer:
It processes the input sequence in both forward and backward directions to capture more context. -> Option CQuick Check:
Bidirectional means forward + backward = C [OK]
- Thinking it only reads backward
- Assuming it reduces parameters
- Confusing it with simpler RNNs
Solution
Step 1: Recall Keras Bidirectional syntax
In Keras, the Bidirectional wrapper takes an RNN layer like LSTM as its argument.Step 2: Check each option
model.add(Bidirectional(LSTM(units=64))) correctly wraps LSTM inside Bidirectional. The other options misuse the syntax or parameters.Final Answer:
model.add(Bidirectional(LSTM(units=64))) -> Option AQuick Check:
Bidirectional wraps LSTM layer = A [OK]
- Putting Bidirectional inside LSTM
- Passing units to Bidirectional instead of LSTM
- Using bidirectional=True parameter in LSTM
from tensorflow.keras.models import Sequential from tensorflow.keras.layers import LSTM, Bidirectional, Dense model = Sequential() model.add(Bidirectional(LSTM(10, return_sequences=False), input_shape=(5, 8))) model.add(Dense(1, activation='sigmoid')) model.compile(optimizer='adam', loss='binary_crossentropy') import numpy as np x = np.random.random((2, 5, 8)) pred = model.predict(x) print(pred.shape)
What will be the shape of
pred?Solution
Step 1: Understand model output shape
The Bidirectional LSTM with 10 units outputs 20 features (10 forward + 10 backward) per timestep. Since return_sequences=False, it outputs only the last timestep's features, shape (batch_size, 20).Step 2: Dense layer output shape
The Dense layer with 1 unit outputs shape (batch_size, 1). Input batch size is 2, so output shape is (2, 1).Final Answer:
(2, 1) -> Option BQuick Check:
Batch size 2, Dense 1 unit = (2, 1) [OK]
- Confusing return_sequences=True vs False
- Forgetting bidirectional doubles units
- Mixing batch and timestep dimensions
model = Sequential() model.add(Bidirectional(LSTM(32), input_shape=(10, 16))) model.add(Dense(1)) model.compile(optimizer='adam', loss='mse') # Training data X_train = np.random.random((100, 10, 16)) y_train = np.random.random((100,)) model.fit(X_train, y_train, epochs=5)
The error says:
ValueError: Error when checking target: expected dense_1 to have shape (None, 1) but got array with shape (100,)What is the fix?
Solution
Step 1: Understand error message
The model expects targets with shape (batch_size, 1) because Dense(1) outputs shape (None, 1). But y_train has shape (100,), missing the last dimension.Step 2: Fix target shape
Reshape y_train to (100, 1) to match model output shape. This fixes the mismatch error.Final Answer:
Change y_train shape to (100, 1) by reshaping it. -> Option DQuick Check:
Target shape matches output shape = B [OK]
- Changing model output units instead of target shape
- Setting return_sequences=True unnecessarily
- Removing Bidirectional without reason
Solution
Step 1: Understand context capture
Bidirectional LSTM reads sequences forward and backward, capturing full context.Step 2: Fixed-size vector output
Using return_sequences=True outputs a sequence, so applying GlobalMaxPooling1D converts it to a fixed-size vector summarizing important features.Step 3: Compare options
Embedding -> Bidirectional(LSTM with return_sequences=True) -> GlobalMaxPooling1D -> Dense uses Bidirectional LSTM with return_sequences=True plus pooling, best for full context and fixed vector. Embedding -> Bidirectional(LSTM with return_sequences=False) -> Dense skips pooling, output is last timestep only. Embedding -> LSTM with return_sequences=False -> Dense is unidirectional. Embedding -> Bidirectional(LSTM with return_sequences=True) -> Dense outputs sequence but no pooling, so Dense gets sequence input, causing shape issues.Final Answer:
Embedding -> Bidirectional(LSTM with return_sequences=True) -> GlobalMaxPooling1D -> Dense -> Option AQuick Check:
Pooling after bidirectional LSTM = A [OK]
- Using return_sequences=False loses sequence info
- Skipping pooling leads to shape mismatch
- Using unidirectional LSTM loses backward context
