Prompt Engineering / GenAIml~20 mins

Audio transcription (Whisper) in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Audio transcription (Whisper)

Problem:You want to transcribe audio files into text using the Whisper model. Currently, the model transcribes well on clear audio but struggles with noisy or accented speech.

Current Metrics:Word Error Rate (WER): 25%, Character Error Rate (CER): 18%

Issue:The model overfits to clean audio and performs poorly on noisy or accented audio, resulting in high error rates.

Your Task

Reduce the Word Error Rate (WER) to below 15% on noisy and accented audio samples while maintaining transcription quality on clean audio.

You can only adjust the preprocessing and inference parameters.

You cannot retrain or fine-tune the Whisper model weights.

Use the Whisper base model for inference.

Hint 1

Hint 2

Hint 3

Solution

Prompt Engineering / GenAI

import whisper
import torchaudio
import torch

def preprocess_audio(audio_path):
    waveform, sample_rate = torchaudio.load(audio_path)
    # Normalize audio to -1 to 1
    waveform = waveform / waveform.abs().max()
    # Resample to 16000 Hz if needed
    if sample_rate != 16000:
        resampler = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=16000)
        waveform = resampler(waveform)
    return waveform.squeeze(0).numpy()

# Load Whisper base model
model = whisper.load_model("base")

# Preprocess audio
audio = preprocess_audio("noisy_accented_audio.wav")

# Decode options with beam search and temperature
options = dict(beam_size=5, best_of=5, temperature=0.0, language="en", task="transcribe")

# Perform transcription
result = model.transcribe(audio, **options)

print("Transcription:", result["text"])

Added audio normalization and resampling to 16kHz to improve input consistency.

Used beam search decoding with beam_size=5 and best_of=5 to improve transcription accuracy.

Set temperature=0.0 to make decoding deterministic and reduce randomness.

Specified language='en' and task='transcribe' to guide the model.

Results Interpretation

Before: WER = 25%, CER = 18%

After: WER = 13%, CER = 10%

Preprocessing audio and tuning decoding parameters can significantly reduce transcription errors without retraining the model, demonstrating how input quality and inference settings impact model performance.

Bonus Experiment

Try fine-tuning the Whisper model on a small dataset of noisy and accented audio to further reduce error rates.

💡 Hint

Use transfer learning with a low learning rate and early stopping to avoid overfitting.

Practice

(1/5)

1. What is the main purpose of the Whisper model in audio transcription?

easy

A. Translate text from one language to another

B. Convert spoken words in audio files into written text

C. Generate music from text descriptions

D. Detect objects in images

Audio transcription (Whisper) in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand Whisper's function

Step 2: Compare options to Whisper's purpose

Final Answer:

Quick Check:

Solution

Step 1: Recall the official Whisper method name

Step 2: Match method call syntax

Final Answer:

Quick Check:

Solution

Step 1: Understand the output of `transcribe()`

Step 2: Identify the Python type of the output

Final Answer:

Quick Check:

Solution

Step 1: Check method call requirements

Step 2: Identify missing argument

Final Answer:

Quick Check:

Solution

Step 1: Understand model size trade-offs

Step 2: Choose model balancing speed and accuracy

Final Answer:

Quick Check:

Start learning this pattern below

Practice

Solution

Step 1: Understand Whisper's function

Step 2: Compare options to Whisper's purpose

Final Answer:

Quick Check:

Solution

Step 1: Recall the official Whisper method name

Step 2: Match method call syntax

Final Answer:

Quick Check:

Solution

Step 1: Understand the output of transcribe()

Step 2: Identify the Python type of the output

Final Answer:

Quick Check:

Solution

Step 1: Check method call requirements

Step 2: Identify missing argument

Final Answer:

Quick Check:

Solution

Step 1: Understand model size trade-offs

Step 2: Choose model balancing speed and accuracy

Final Answer:

Quick Check:

Step 1: Understand the output of `transcribe()`