Speechdft-16-8-mono-5secs.wav

# Quick sanity check – plot the waveform plt.figure(figsize=(10, 2)) plt.plot(np.arange(len(audio_float))/sr, audio_float, lw=0.5) plt.title('Waveform (5 s of speech)') plt.xlabel('Time (s)') plt.ylabel('Amplitude') plt.show() a familiar “wiggly” speech trace, with a modest amount of quantisation “step‑noise” that is typical of 8‑bit audio. 3. A First‑Look Discrete Fourier Transform (DFT) The DFT is the workhorse that turns a time‑domain signal into its frequency‑domain representation. Let’s compute a single‑sided magnitude spectrum and visualise it.

# Compute 13 MFCCs (typical default) mfccs = librosa.feature.mfcc(y=y, sr=sr_lib, n_mfcc=13, n_fft=512, hop_length=256)

# ------------------------------------------------- # 1️⃣ Load the wav file # ------------------------------------------------- sr, audio_int = wavfile.read('speechdft-16-8-mono-5secs.wav') print(f'Sample rate: sr Hz') print(f'Data type: audio_int.dtype, shape: audio_int.shape')

import numpy as np from scipy.io import wavfile import matplotlib.pyplot as plt speechdft-16-8-mono-5secs.wav

# ------------------------------------------------- # 3️⃣ Compute the DFT (via FFT) – only the positive frequencies # ------------------------------------------------- N = len(audio_float) # number of samples = 5 s × 16 kHz = 80 000 fft_vals = np.fft.rfft(audio_float) # real‑valued FFT → N/2+1 points fft_mag = np.abs(fft_vals) / N # normalise magnitude

# Frequency axis (Hz) freqs = np.fft.rfftfreq(N, d=1/sr)

import librosa import librosa.display

# Load with librosa (it handles 8‑bit conversion internally) y, sr_lib = librosa.load('speechdft-16-8-mono-5secs.wav', sr=16000, mono=True)

plt.figure(figsize=(10, 3)) librosa.display.specshow(log_S, sr=sr, hop_length=hop_len, x_axis='time', y_axis='mel', cmap='magma') plt.title('Log‑Mel Spectrogram (40 bands)') plt.colorbar(format='%+2.0f dB') plt.tight_layout() plt.show() | Challenge | Quick Fix | |-----------|-----------| | Clipping / low dynamic range | Apply a simple gain ( audio_float *= 1.5 ) before feature extraction, but beware of re‑quantisation if you write back to 8‑bit. | | **Noise

y, sr = librosa.load('speechdft-16-8-mono-5secs.wav', sr=16000) # Quick sanity check – plot the waveform plt

import librosa import librosa.display

S = librosa.feature.melspectrogram(y=y, sr=sr, n_fft=n_fft, hop_length=hop_len, n_mels=n_mels, fmax=sr/2) log_S = librosa.power_to_db(S, ref=np.max)

# ------------------------------------------------- # 2️⃣ Convert 8‑bit unsigned PCM to float [-1, 1] # ------------------------------------------------- # 8‑bit PCM in wav files is typically unsigned (0‑255) audio_float = (audio_int.astype(np.float32) - 128) / 128.0 # now in [-1, 1] sr_lib = librosa.load('speechdft-16-8-mono-5secs.wav'

# Parameters n_fft = 1024 hop_len = 512 n_mels = 40