Audio visualizer with the Web Audio API
Overview
The Web Audio API gives you access to the raw frequency and time-domain data of any audio playing in the browser. An AnalyserNode performs a real-time FFT (Fast Fourier Transform) and hands you an array of frequency magnitudes at 60fps. The question is what to do with that array.
Architecture
The signal chain:
AudioSource → AnalyserNode → (visualization)
↓
GainNode → AudioDestination (speakers)
The analyser sits on a branch — it taps the signal without modifying it. I configure it with fftSize: 2048, which gives me 1024 frequency bins. Each bin represents a frequency band, with the spacing determined by the sample rate (typically 44.1kHz). So each bin spans about 43Hz.
Visualization modes
Waveform: The simplest — plot getByteTimeDomainData() as a line. This gives you the raw waveform, which is satisfying for percussive music but kind of boring for sustained tones.
Spectrum bars: Classic. Map each frequency bin (or a group of bins) to a bar height. I use a logarithmic frequency scale so bass frequencies get adequate visual space — on a linear scale, the first few bars would contain everything from 20Hz to 2kHz and the remaining 1020 bars would cover the inaudible upper range.
Radial: My favorite. Arrange the frequency bins in a circle, with magnitude controlling radius. Bass in the center, treble on the outside. When a kick drum hits, the whole thing pulses outward. When a hi-hat hits, the outer ring flickers. It makes the frequency content spatially intuitive.
Particle field: 500 particles whose behavior is driven by frequency bands. Low frequencies control gravity. Mid frequencies control velocity. High frequencies control color temperature. The result is a particle system that “dances” — not randomly, but in response to the actual harmonic content of the music. Rhythmic music produces rhythmic motion. Ambient music produces slow drifts.
Challenges
Latency. The analyser introduces a small delay (one FFT window, ~23ms at 2048 samples). For visualization this is imperceptible. For anything requiring sync with video, it matters.
Perceptual weighting. Raw FFT magnitudes don’t correspond to perceived loudness. I apply A-weighting to approximate how human hearing works — we’re most sensitive around 2-5kHz and less sensitive at the extremes. Without this, bass-heavy music looks underwhelming even when it sounds overwhelming.
Microphone input. Using getUserMedia() to visualize ambient sound (from a room mic) instead of a file. Works great for live performances. The permissions UX is terrible.