Beam Search in NLP: What It Is and How It Works
beam search is a method to find the best sequence of words by keeping multiple top guesses at each step instead of just one. It balances between exploring many options and focusing on the most likely sequences to improve prediction quality.How It Works
Imagine you are trying to guess a sentence word by word. Instead of picking only the single best word each time, beam search keeps a small list of the best guesses (called the beam) at every step. This way, it explores several possible sentences in parallel.
At each step, beam search looks at all possible next words for each sentence in the beam, scores them, and keeps only the top few sequences with the highest scores. This process repeats until the sentence is complete. It’s like having a few friends guessing sentences together and sharing their best ideas, so you don’t miss good options by focusing on just one guess.
Example
This example shows a simple beam search to find the best sequence of numbers based on given scores.
def beam_search(scores, beam_width): sequences = [[[], 0]] # list of [sequence, score] for step_scores in scores: all_candidates = [] for seq, score in sequences: for i, s in enumerate(step_scores): candidate = [seq + [i], score + s] # use sum of scores (higher is better) all_candidates.append(candidate) # sort candidates by score and keep top beam_width ordered = sorted(all_candidates, key=lambda tup: tup[1], reverse=True) sequences = ordered[:beam_width] return sequences # Scores for 3 steps, each with 4 possible next tokens scores = [ [0.1, 0.4, 0.3, 0.2], [0.3, 0.2, 0.5, 0.1], [0.4, 0.1, 0.2, 0.3] ] result = beam_search(scores, beam_width=2) print(result)
When to Use
Beam search is useful when generating text, like in machine translation, speech recognition, or chatbots. It helps find better sentences than just picking the single best word each time (called greedy search), but without trying every possible sentence (which is too slow).
Use beam search when you want a good balance between quality and speed in sequence prediction tasks. It works well when the output is a sequence of words or tokens and you want to consider multiple possibilities to improve accuracy.
Key Points
- Beam search keeps multiple best guesses at each step instead of one.
- It balances exploring options and focusing on likely sequences.
- Commonly used in NLP tasks like translation and text generation.
- Beam width controls how many sequences are kept; bigger means better results but slower.
