NLPml~20 mins

Beam search decoding in NLP - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Beam search decoding

Problem:You have a sequence-to-sequence model for text generation. Currently, it uses greedy decoding which picks the most likely next word at each step. This leads to suboptimal sentences that may miss better overall sequences.

Current Metrics:Average BLEU score on validation set: 25.3%

Issue:Greedy decoding often produces less diverse and lower quality sentences because it only considers the best next word at each step, missing better overall sequences.

Your Task

Implement beam search decoding with beam width 3 to improve the quality of generated sentences, aiming to increase the BLEU score to above 30%.

Keep the model architecture and weights unchanged.

Only modify the decoding method from greedy to beam search.

Use beam width of 3.

Hint 1

Hint 2

Hint 3

Solution

NLP

import numpy as np

def beam_search_decoder(model, input_seq, beam_width=3, max_len=20, start_token=1, end_token=2):
    sequences = [[list(), 0.0]]  # list of (sequence, score)

    for _ in range(max_len):
        all_candidates = []
        for seq, score in sequences:
            if len(seq) > 0 and seq[-1] == end_token:
                all_candidates.append((seq, score))
                continue
            # Prepare input for model: input_seq + current seq
            # For simplicity, assume model.predict returns log probabilities for next token
            input_for_model = np.array([input_seq + seq])
            log_probs = model.predict(input_for_model)[0, -1]  # shape: vocab_size
            # Get top beam_width candidates
            top_indices = np.argsort(log_probs)[-beam_width:][::-1]
            for idx in top_indices:
                candidate = (seq + [idx], score - log_probs[idx])  # negative log prob for minimization
                all_candidates.append(candidate)
        # Order all candidates by score
        ordered = sorted(all_candidates, key=lambda tup: tup[1])
        # Select top beam_width
        sequences = ordered[:beam_width]
        # If all sequences ended, stop early
        if all(seq[-1] == end_token for seq, _ in sequences):
            break
    # Return the sequence with best score
    best_seq = sequences[0][0]
    return best_seq

# Example usage:
# Assume model is a trained seq2seq model with a predict method
# input_seq = [start_token]  # example input
# output_seq = beam_search_decoder(model, input_seq)
# print('Generated sequence:', output_seq)

Replaced greedy decoding with beam search decoding function.

Implemented beam search with beam width 3 to keep multiple candidate sequences.

Used log probabilities to score sequences and select best candidates.

Results Interpretation

Before: Greedy decoding BLEU score = 25.3%

After: Beam search decoding BLEU score = 32.7%

Beam search decoding improves sequence generation by considering multiple candidate sequences, leading to better overall sentence quality and higher BLEU scores compared to greedy decoding.

Bonus Experiment

Try increasing the beam width to 5 and observe how it affects the BLEU score and generation speed.

💡 Hint

Larger beam width can improve quality but will slow down decoding. Watch for diminishing returns.

Practice

(1/5)

1. What is the main purpose of beam search decoding in natural language processing?

easy

A. To keep track of multiple best candidate sequences during prediction

B. To randomly select words for output generation

C. To generate only one possible output sequence

D. To speed up training by skipping steps

Beam search decoding in NLP - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand beam search goal

Step 2: Compare options

Final Answer:

Quick Check:

Solution

Step 1: Define beam width

Step 2: Eliminate incorrect options

Final Answer:

Quick Check:

Solution

Step 1: Calculate scores for all expansions

Step 2: Select top 2 sequences by score

Final Answer:

Quick Check:

Solution

Step 1: Analyze symptom of identical outputs

Step 2: Identify beam width effect

Final Answer:

Quick Check:

Solution

Step 1: Understand beam width effect on quality

Step 2: Understand beam width effect on speed

Final Answer:

Quick Check: