0
0
PytorchHow-ToBeginner · 4 min read

How to Use Torchaudio: Load, Transform, and Save Audio with PyTorch

Use torchaudio.load() to read audio files into tensors, apply transformations like torchaudio.transforms.MelSpectrogram for feature extraction, and save audio with torchaudio.save(). Torchaudio integrates smoothly with PyTorch for audio processing and model training.
📐

Syntax

torchaudio.load(filepath, normalize=True): Loads an audio file and returns a tensor and sample rate.

torchaudio.save(filepath, tensor, sample_rate): Saves a tensor as an audio file.

torchaudio.transforms.*: Various audio transformations like MelSpectrogram, Resample, etc.

python
import torchaudio

# Load audio
waveform, sample_rate = torchaudio.load('audio.wav')

# Apply a transform
mel_spectrogram = torchaudio.transforms.MelSpectrogram(sample_rate=sample_rate)
mel_spec = mel_spectrogram(waveform)

# Save audio
torchaudio.save('output.wav', waveform, sample_rate)
💻

Example

This example loads a WAV file, converts it to a Mel spectrogram, and saves the original audio back to disk.

python
import torchaudio

# Load an example audio file included in torchaudio
waveform, sample_rate = torchaudio.load(torchaudio.utils.download_asset('tutorial-assets/steam-train-whistle-daniel_simon.wav'))

print(f'Waveform shape: {waveform.shape}')
print(f'Sample rate: {sample_rate}')

# Create MelSpectrogram transform
mel_spectrogram = torchaudio.transforms.MelSpectrogram(sample_rate=sample_rate)

# Apply transform
mel_spec = mel_spectrogram(waveform)
print(f'MelSpectrogram shape: {mel_spec.shape}')

# Save the original waveform to a new file
torchaudio.save('saved_train_whistle.wav', waveform, sample_rate)

print('Audio saved as saved_train_whistle.wav')
Output
Waveform shape: torch.Size([1, 276858]) Sample rate: 44100 MelSpectrogram shape: torch.Size([1, 128, 1087]) Audio saved as saved_train_whistle.wav
⚠️

Common Pitfalls

  • Not matching the sample rate when applying transforms or saving audio can cause errors or distorted audio.
  • For stereo audio, waveform shape is [channels, samples]; forgetting this can cause shape mismatches.
  • Using torchaudio.load() without normalization may return integer tensors; set normalize=True for float tensors between -1 and 1.
python
import torchaudio

# Wrong: saving with wrong sample rate
waveform, sample_rate = torchaudio.load('audio.wav')
wrong_sample_rate = 16000
# This will distort audio if sample_rate != wrong_sample_rate
# torchaudio.save('wrong_save.wav', waveform, wrong_sample_rate)  # Avoid this

# Right: use original sample rate
# torchaudio.save('correct_save.wav', waveform, sample_rate)
📊

Quick Reference

torchaudio.load(filepath, normalize=True): Load audio file as tensor and sample rate.
torchaudio.save(filepath, tensor, sample_rate): Save tensor as audio file.
torchaudio.transforms.MelSpectrogram(sample_rate): Convert waveform to Mel spectrogram.
torchaudio.transforms.Resample(orig_freq, new_freq): Change audio sample rate.
Always check tensor shape and sample rate compatibility.

Key Takeaways

Use torchaudio.load() to read audio files into PyTorch tensors with sample rate info.
Apply torchaudio.transforms for audio feature extraction like MelSpectrogram.
Save audio tensors back to files with torchaudio.save(), matching sample rates.
Always verify waveform shape and sample rate to avoid errors or audio distortion.
Torchaudio integrates seamlessly with PyTorch for audio data processing and modeling.