Beam search decoding helps find the best possible output sequence from a model by exploring multiple options at once, instead of just one. It balances between trying many possibilities and keeping the search manageable.
Beam search decoding in NLP
import math def beam_search_decoder(predictions, beam_width): sequences = [[[], 0.0]] for row in predictions: all_candidates = [] for seq, score in sequences: for j, prob in enumerate(row): if prob > 0: candidate = [seq + [j], score - math.log(prob)] all_candidates.append(candidate) ordered = sorted(all_candidates, key=lambda tup: tup[1]) sequences = ordered[:beam_width] return sequences
The predictions input is usually a list of probability distributions for each step in the sequence.
beam_width controls how many sequences to keep at each step; bigger means better results but slower.
beam_search_decoder([[0.1, 0.9], [0.8, 0.2]], beam_width=2)
beam_search_decoder([[0.3, 0.7, 0.0], [0.4, 0.4, 0.2]], beam_width=1)
This program runs beam search decoding on a small example with 3 steps and 3 tokens each. It prints the top 2 sequences found and their scores.
import math def beam_search_decoder(predictions, beam_width): sequences = [[[], 0.0]] # sequences and their scores (negative log probabilities) for row in predictions: all_candidates = [] for seq, score in sequences: for j, prob in enumerate(row): if prob > 0: candidate = [seq + [j], score - math.log(prob)] all_candidates.append(candidate) sequences = sorted(all_candidates, key=lambda tup: tup[1])[:beam_width] return sequences # Example predictions: 3 steps, 3 possible tokens each predictions = [ [0.1, 0.6, 0.3], [0.3, 0.5, 0.2], [0.4, 0.4, 0.2] ] beam_width = 2 results = beam_search_decoder(predictions, beam_width) print("Top sequences and their scores:") for seq, score in results: print(f"Sequence: {seq}, Score: {score:.4f}")
Beam search is not guaranteed to find the absolute best sequence but usually finds a good one faster than trying all possibilities.
Using log probabilities helps avoid very small numbers and makes multiplication into addition.
Choosing beam width is a trade-off: bigger means better results but slower computation.
Beam search keeps multiple best guesses at each step to find good output sequences.
It balances quality and speed by limiting how many sequences it tracks.
It is widely used in language tasks like translation and text generation.