Challenge - 5 Problems
Whisper Transcription Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of Whisper transcription code snippet
What is the output of the following Python code that uses Whisper to transcribe a short audio file?
Prompt Engineering / GenAI
import whisper model = whisper.load_model('small') result = model.transcribe('audio_sample.wav') print(result['text'])
Attempts:
2 left
💡 Hint
The model transcribes the audio content into text stored in the 'text' key of the result dictionary.
✗ Incorrect
The Whisper model's transcribe method returns a dictionary with a 'text' key containing the transcription of the audio file. If the file exists and is clear, it outputs the spoken words as text.
❓ Model Choice
intermediate1:30remaining
Choosing the right Whisper model for fast transcription
You want to transcribe a large number of short audio clips quickly with reasonable accuracy. Which Whisper model should you choose?
Attempts:
2 left
💡 Hint
Smaller models run faster but with less accuracy.
✗ Incorrect
The 'tiny' model is the fastest and smallest Whisper model, suitable for quick transcription of many short clips with acceptable accuracy.
❓ Hyperparameter
advanced2:00remaining
Effect of temperature parameter in Whisper transcription
In Whisper's transcribe method, what is the effect of increasing the 'temperature' parameter from 0.0 to 1.0?
Attempts:
2 left
💡 Hint
Temperature controls randomness in text generation.
✗ Incorrect
Higher temperature values increase randomness in the output, which can lead to more varied but less consistent transcriptions.
❓ Metrics
advanced1:30remaining
Evaluating Whisper transcription quality
Which metric is most appropriate to measure the accuracy of Whisper's transcriptions compared to ground truth text?
Attempts:
2 left
💡 Hint
This metric counts word-level differences between predicted and true text.
✗ Incorrect
Word Error Rate (WER) measures the number of word insertions, deletions, and substitutions needed to match the transcription to the reference, making it ideal for speech recognition evaluation.
🔧 Debug
expert2:30remaining
Debugging a Whisper transcription error
You run the following code but get a RuntimeError: CUDA out of memory. What is the best way to fix this error?
Prompt Engineering / GenAI
import whisper model = whisper.load_model('large') result = model.transcribe('long_audio.wav') print(result['text'])
Attempts:
2 left
💡 Hint
Large models use more GPU memory; smaller models use less.
✗ Incorrect
The 'large' Whisper model requires a lot of GPU memory. Using a smaller model reduces memory needs and avoids the out-of-memory error.