Speech to Text in Python: Simple NLP Guide
Use the
SpeechRecognition library in Python to convert audio speech into text by loading an audio file or microphone input and applying a recognizer like Google Web Speech API. This process involves creating a recognizer object, capturing audio, and calling recognize_google() to get the text output.Syntax
The basic syntax involves these steps:
- Create a
Recognizer()object. - Load audio from a file or microphone using
AudioFile()orMicrophone(). - Use
recognizer.record()orrecognizer.listen()to capture audio data. - Call
recognizer.recognize_google(audio)to convert speech to text.
python
import speech_recognition as sr recognizer = sr.Recognizer() with sr.AudioFile('audio.wav') as source: audio_data = recognizer.record(source) text = recognizer.recognize_google(audio_data) print(text)
Example
This example shows how to convert speech from an audio file named audio.wav into text using the Google Web Speech API through the SpeechRecognition library.
python
import speech_recognition as sr # Initialize recognizer recognizer = sr.Recognizer() # Load audio file with sr.AudioFile('audio.wav') as source: audio_data = recognizer.record(source) # read the entire audio file # Recognize speech using Google Web Speech API try: text = recognizer.recognize_google(audio_data) print('Recognized Text:', text) except sr.UnknownValueError: print('Google Speech Recognition could not understand audio') except sr.RequestError as e: print(f'Could not request results from Google Speech Recognition service; {e}')
Output
Recognized Text: hello world this is a test
Common Pitfalls
- Not installing the
SpeechRecognitionlibrary or missing dependencies likePyAudiofor microphone input. - Using an unsupported audio format; convert audio to WAV format for best compatibility.
- Ignoring exceptions like
UnknownValueErrorwhen speech is unclear orRequestErrorwhen offline or API fails. - Not having internet connection since
recognize_google()requires it.
python
import speech_recognition as sr recognizer = sr.Recognizer() # Wrong: Using unsupported audio format or missing error handling with sr.AudioFile('audio.mp3') as source: audio_data = recognizer.record(source) text = recognizer.recognize_google(audio_data) # May fail without try-except # Right: Use WAV file and handle exceptions with sr.AudioFile('audio.wav') as source: audio_data = recognizer.record(source) try: text = recognizer.recognize_google(audio_data) except sr.UnknownValueError: text = 'Could not understand audio' except sr.RequestError: text = 'API unavailable' print(text)
Quick Reference
Summary tips for speech to text in Python:
- Use
SpeechRecognitionlibrary for easy speech-to-text. - Convert audio files to WAV format for best results.
- Handle exceptions to avoid crashes.
- Ensure internet connection for Google API.
- For microphone input, install
PyAudioand useMicrophone().
Key Takeaways
Use the SpeechRecognition library with recognize_google() to convert speech audio to text in Python.
Always handle exceptions like UnknownValueError and RequestError to manage errors gracefully.
Convert audio files to WAV format for compatibility and better recognition accuracy.
An active internet connection is required for Google Web Speech API to work.
For live speech input, install PyAudio and use the Microphone class with proper permissions.
