How to Use Torchaudio: Load, Transform, and Save Audio with PyTorch
torchaudio.load() to read audio files into tensors, apply transformations like torchaudio.transforms.MelSpectrogram for feature extraction, and save audio with torchaudio.save(). Torchaudio integrates smoothly with PyTorch for audio processing and model training.Syntax
torchaudio.load(filepath, normalize=True): Loads an audio file and returns a tensor and sample rate.
torchaudio.save(filepath, tensor, sample_rate): Saves a tensor as an audio file.
torchaudio.transforms.*: Various audio transformations like MelSpectrogram, Resample, etc.
import torchaudio # Load audio waveform, sample_rate = torchaudio.load('audio.wav') # Apply a transform mel_spectrogram = torchaudio.transforms.MelSpectrogram(sample_rate=sample_rate) mel_spec = mel_spectrogram(waveform) # Save audio torchaudio.save('output.wav', waveform, sample_rate)
Example
This example loads a WAV file, converts it to a Mel spectrogram, and saves the original audio back to disk.
import torchaudio # Load an example audio file included in torchaudio waveform, sample_rate = torchaudio.load(torchaudio.utils.download_asset('tutorial-assets/steam-train-whistle-daniel_simon.wav')) print(f'Waveform shape: {waveform.shape}') print(f'Sample rate: {sample_rate}') # Create MelSpectrogram transform mel_spectrogram = torchaudio.transforms.MelSpectrogram(sample_rate=sample_rate) # Apply transform mel_spec = mel_spectrogram(waveform) print(f'MelSpectrogram shape: {mel_spec.shape}') # Save the original waveform to a new file torchaudio.save('saved_train_whistle.wav', waveform, sample_rate) print('Audio saved as saved_train_whistle.wav')
Common Pitfalls
- Not matching the sample rate when applying transforms or saving audio can cause errors or distorted audio.
- For stereo audio, waveform shape is [channels, samples]; forgetting this can cause shape mismatches.
- Using
torchaudio.load()without normalization may return integer tensors; setnormalize=Truefor float tensors between -1 and 1.
import torchaudio # Wrong: saving with wrong sample rate waveform, sample_rate = torchaudio.load('audio.wav') wrong_sample_rate = 16000 # This will distort audio if sample_rate != wrong_sample_rate # torchaudio.save('wrong_save.wav', waveform, wrong_sample_rate) # Avoid this # Right: use original sample rate # torchaudio.save('correct_save.wav', waveform, sample_rate)
Quick Reference
torchaudio.load(filepath, normalize=True): Load audio file as tensor and sample rate.
torchaudio.save(filepath, tensor, sample_rate): Save tensor as audio file.
torchaudio.transforms.MelSpectrogram(sample_rate): Convert waveform to Mel spectrogram.
torchaudio.transforms.Resample(orig_freq, new_freq): Change audio sample rate.
Always check tensor shape and sample rate compatibility.