Bird
Raised Fist0
Prompt Engineering / GenAIml~20 mins

Audio transcription (Whisper) in Prompt Engineering / GenAI - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Whisper Transcription Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of Whisper transcription code snippet
What is the output of the following Python code that uses Whisper to transcribe a short audio file?
Prompt Engineering / GenAI
import whisper
model = whisper.load_model('small')
result = model.transcribe('audio_sample.wav')
print(result['text'])
A"[Noise] Unintelligible speech detected."
B"Hello, this is a test audio for transcription."
CSyntaxError: invalid syntax
DFileNotFoundError: [Errno 2] No such file or directory: 'audio_sample.wav'
Attempts:
2 left
💡 Hint
The model transcribes the audio content into text stored in the 'text' key of the result dictionary.
Model Choice
intermediate
1:30remaining
Choosing the right Whisper model for fast transcription
You want to transcribe a large number of short audio clips quickly with reasonable accuracy. Which Whisper model should you choose?
Atiny
Blarge-v2
Cmedium
Dlarge
Attempts:
2 left
💡 Hint
Smaller models run faster but with less accuracy.
Hyperparameter
advanced
2:00remaining
Effect of temperature parameter in Whisper transcription
In Whisper's transcribe method, what is the effect of increasing the 'temperature' parameter from 0.0 to 1.0?
ADecreases transcription speed significantly
BImproves transcription accuracy by reducing errors
CIncreases randomness in transcription, possibly producing more diverse but less stable text
DSwitches the model to a different language automatically
Attempts:
2 left
💡 Hint
Temperature controls randomness in text generation.
Metrics
advanced
1:30remaining
Evaluating Whisper transcription quality
Which metric is most appropriate to measure the accuracy of Whisper's transcriptions compared to ground truth text?
AWord Error Rate (WER)
BMean Squared Error (MSE)
CAccuracy Score
DF1 Score
Attempts:
2 left
💡 Hint
This metric counts word-level differences between predicted and true text.
🔧 Debug
expert
2:30remaining
Debugging a Whisper transcription error
You run the following code but get a RuntimeError: CUDA out of memory. What is the best way to fix this error?
Prompt Engineering / GenAI
import whisper
model = whisper.load_model('large')
result = model.transcribe('long_audio.wav')
print(result['text'])
ARun the code without a GPU by setting device='cpu' in load_model()
BIncrease the batch size to process more audio at once
CUse a higher temperature value in transcribe()
DSwitch to a smaller model like 'medium' or 'small' to reduce GPU memory usage
Attempts:
2 left
💡 Hint
Large models use more GPU memory; smaller models use less.

Practice

(1/5)
1. What is the main purpose of the Whisper model in audio transcription?
easy
A. Translate text from one language to another
B. Convert spoken words in audio files into written text
C. Generate music from text descriptions
D. Detect objects in images

Solution

  1. Step 1: Understand Whisper's function

    Whisper is designed to listen to audio and write down what it hears as text.
  2. Step 2: Compare options to Whisper's purpose

    Only Convert spoken words in audio files into written text matches this function; others describe unrelated tasks.
  3. Final Answer:

    Convert spoken words in audio files into written text -> Option B
  4. Quick Check:

    Whisper transcribes speech to text [OK]
Hint: Whisper turns speech into text, not images or translations [OK]
Common Mistakes:
  • Confusing transcription with translation
  • Thinking Whisper generates images or music
  • Mixing audio transcription with image recognition
2. Which of the following is the correct way to call the Whisper model's transcription method in Python?
easy
A. model.audio_transcribe()
B. model.transcript(audio_file)
C. model.transcribe_audio(audio_file)
D. model.transcribe(audio_file)

Solution

  1. Step 1: Recall the official Whisper method name

    The method to get text from audio is called transcribe().
  2. Step 2: Match method call syntax

    model.transcribe(audio_file) uses model.transcribe(audio_file), which is correct syntax.
  3. Final Answer:

    model.transcribe(audio_file) -> Option D
  4. Quick Check:

    Use transcribe() method for transcription [OK]
Hint: Remember method name is exactly 'transcribe' with parentheses [OK]
Common Mistakes:
  • Using incorrect method names like 'transcript' or 'transcribe_audio'
  • Omitting parentheses when calling the method
  • Confusing method with attribute access
3. Given the following Python code using Whisper, what will be the output type of result?
model = whisper.load_model('small')
audio_path = 'speech.mp3'
result = model.transcribe(audio_path)
print(type(result))
medium
A.
B.
C.
D.

Solution

  1. Step 1: Understand the output of transcribe()

    The transcribe() method returns a dictionary containing keys like 'text' with the transcription.
  2. Step 2: Identify the Python type of the output

    Since the output holds multiple pieces of information, it is a dict, not a string or list.
  3. Final Answer:

    <class 'dict'> -> Option C
  4. Quick Check:

    Whisper transcribe returns dict with transcription text [OK]
Hint: Whisper returns a dict with keys, not just a string [OK]
Common Mistakes:
  • Assuming output is a plain string of text
  • Thinking output is a list of words
  • Confusing tuple with dictionary
4. You run this code but get an error:
model = whisper.load_model('medium')
result = model.transcribe()
What is the likely cause of the error?
medium
A. Missing audio file argument in transcribe() call
B. Model size 'medium' is not supported
C. transcribe() method does not exist
D. Audio file path is incorrect

Solution

  1. Step 1: Check method call requirements

    The transcribe() method requires an audio file path argument to process.
  2. Step 2: Identify missing argument

    The code calls transcribe() without any argument, causing an error.
  3. Final Answer:

    Missing audio file argument in transcribe() call -> Option A
  4. Quick Check:

    transcribe() needs audio file input [OK]
Hint: Always pass audio file path to transcribe() [OK]
Common Mistakes:
  • Forgetting to provide audio file argument
  • Assuming model size 'medium' is invalid
  • Thinking transcribe() needs no arguments
5. You want to transcribe a long audio file quickly but can accept slightly less accuracy. Which Whisper model size should you choose?
hard
A. tiny
B. medium
C. large
D. small

Solution

  1. Step 1: Understand model size trade-offs

    Smaller models like 'tiny' are fastest but less accurate; larger models are slower but more accurate.
  2. Step 2: Choose model balancing speed and accuracy

    'tiny' model offers the fastest transcription speed with acceptable accuracy trade-off for long audio.
  3. Final Answer:

    tiny -> Option A
  4. Quick Check:

    Choose 'tiny' for fastest transcription with some accuracy loss [OK]
Hint: Pick 'tiny' for fastest transcription with some accuracy trade-off [OK]
Common Mistakes:
  • Choosing 'small' expecting fastest speed
  • Picking 'large' for speed
  • Confusing 'medium' as fastest