Model Pipeline - Audio transcription (Whisper)
This pipeline converts spoken audio into written text using the Whisper model. It listens to audio, processes it, and outputs the transcription.
This pipeline converts spoken audio into written text using the Whisper model. It listens to audio, processes it, and outputs the transcription.
Loss
2.5 |****
2.0 |***
1.5 |**
1.0 |*
0.5 |
+----
1 2 3 4 5 Epochs
| Epoch | Loss ↓ | Accuracy ↑ | Observation |
|---|---|---|---|
| 1 | 2.3 | 0.45 | Model starts learning basic audio-text alignment |
| 2 | 1.8 | 0.60 | Loss decreases, accuracy improves as model learns speech patterns |
| 3 | 1.4 | 0.72 | Model better understands phonemes and word boundaries |
| 4 | 1.1 | 0.80 | Improved transcription quality, fewer errors |
| 5 | 0.9 | 0.85 | Model converges with good transcription accuracy |