The promise of perfect audio—the Nyquist Theorem
Most people who’ve look at digital audio before know about the Nyquist theorem—if you sample an analog signal at a rate of at least twice its highest frequency component, you can convert it back to analog, passing through a low-pass filter, and get back the same thing you put in. Exactly. Perfectly.
The real world
In the real world, though, many people argue that analog “sounds better.” How can this be, if digital audio is perfect?
For one thing, we’ve grown to like some of the deficiencies of analog recording. Just as tube amplifiers give a more pleasant distortion and compression to musical signals than transistors, analog tape similarly warms up and fattens the sound.
Of course, this alone isn’t a reason to forsake digital’s many conveniences. We can always use other means, such as tube compressors, to fatten the sound if needed. The real problems lie with the real-world problems Nyquist didn’t warn us about.
First, there is no such thing as the perfect low-pass filter required by Nyquist’s theorem. A real filter has a finite slope, so we need to set its cut-off a little lower than theory. Also, a steep filter has a lot of phase shift near and above the cutoff. And some aliasing is bound to leak through at the very high end. A technique called oversampling has been developed to reduce these problems.
Another big problem is finite wordlength effects—we’re using 16-bit samples, not the pure numbers of the Nyquist theorem, so we have to compromise the sample values. To start, 16 bits is not as great as it seems. Yes, it translates into 96 dB dynamic range, but that’s an absolute ceiling—you can’t go any higher. So, the average music level must be much lower in order to allow headroom for peaks. And, at the low amplitude end, distortion of small-signal components is very high, contributing to the “brittle” sound that many people describe with digital audio. On top of this, any gain change (from mixing tracks or changing volumes) causes individual samples to be rounded to the nearest bit level, adding distortion. Fortunately, a technique called dithering relieves these problems.
Clock jitter is another problem. If the sample clock timing is not perfect, it creates another kind of distortion. For a self-contained unit, the solution is simply more accurate timing; reducing timing errors reduces the distortion to a negligible level. When digitally interfacing with other units, though, the issue becomes a little more complex, but is not a problem when handled correctly.
Finally, an often overlooked detail in digital audio discussion is that Nyquist’s samples are instantaneous values—impulses. Our digital systems generally output stairsteps to the convertor and low-pass filter, holding the current sample level until the next. This causes a frequency droop and loss of highs—impulses carry more high-frequency energy than stairsteps. The solution is not to produce impulses—which are impossible to produce perfectly—but to simply adjust the frequency response with filtering. Fortunately, it’s trivial to add this adjustment to an oversampling filter.
nyquist requires an infinite number of samples not just band limited signal
nuyquist does not require a low pass filter
that is an engineering decision which may be required or not
Of course a signal cannot be perfectly bandlimited if it’s finite, but that information is more pedantic than practical (the site’s tag line is “Practical signal audio signal processing”), and this article extremely general (not an integration or summation to be seen).
The lowpass filter I referred to is required—it’s the output filter, not input. (I didn’t mention the need of an input filter—just that the input must be bandlimited. An input filter is indeed an engineering decision, but it’s one that’s always made for the general case.)
Yes, I misstated to say that Nyquist himself required it, But I was using “Nyquist” is a broad sense to include Nyquist-Shannon-etc.
at nyquist limit, your sine wave is a triangle, put simply. it is *possible* to reconstruct the exact signal up to nyquist (assuming perfect sampling to begin with), but is it likely to be implemented that well in your battery powered portable music device?
there’s also the argument that transients affect the quality of audible frequencies.
and still, people pay apple over and over for their own ‘copy’ of a song – at 128 bps AAC – 44.1kHz at 16 bits. eurgh!
oh yeah, and awesome article and all that. 😀
The sine wave is a triangle at (near) Nyquist only if you “connect the dots” (by interpolating linearly at a very high oversampling rate, for instance). In reality, it’s alternating impulses. But we don’t play it back that way—we run it through a lowpass filter. So, you get a sine wave (except in the practical world you get an attenuated sine wave, due to the slope of the filter, and some aliasing, depending on the filter quality).
Glad you liked it!
But was Nyquist correct? I’ve never managed to get an answer to this, so as an expert in such things, Can you can help me understand?:
You could sample a sine wave at twice it’s frequency and get the zero crossing point every time, yes? No way can you reconstruct the original from that. Similarly you could sample it off zero and get the correct frequency, but you could fit any number of valid sine waves of various amplitudes and phase shifts to the samples.
Even as the absolute lower limit, double-frequency sampling seems flawed. It must surely be well below the lower limit for accurate reproduction? Isn’t it misleading to regard Nyquist’s sampling frequency as an absolute truth, as it is usually presented?
If you could be certain to start sampling at a peak, then Nyquist is arguably be right, but no-one seems to mention that, and in any case it seems an unreasonable constraint when you usually want to record an arbitrary signal starting at an arbitrary time. Is it right to apply Nyquist in the real recording/playback world?
Love the site. Will be coming back.
Ah—you caught a goof—thanks!
The sampling rate must always be greater than twice the highest component, not “at least twice” as I said at the top of the article. (The sampling theorem is clear about this.) In Sampling in-depth, for instance, I say, “…as long as the sampling rate is greater than twice the highest frequency component in the continuous signal.” As a practical matter, the anti-aliasing filter’s requires that we start rolling off before we get too close to half the sampling frequency anyway.
Nice article, and it’s great to see “digital” being defended, though I have a few quibbles:
That some people (including me) like the sound of subtle distortion on some types of music is not an argument for preferring analog recording mediums. Digital recording is indeed better than analog tape (or direct to disk) because the very definition of high fidelity is faithfulness to the source. Even 16-bit digital beats analog tape by a mile in every way one can assess fidelity. Adding distortion should be part of the production process, but it’s not needed or wanted for reproduction. And you certainly don’t want a crunchy sound on everything, for example a tambourine or flute. If you want distortion on a particular track or full mix, there are tape-sim and tube-sim plug-ins.
Yes, oversampling avoids the problems you mentioned, but they were never serious problems anyway. Certainly nowhere near as bad as tape hiss or the grunge you get after only two or three copy generations. Even way back in the 1980s when 14 bits was about the best converters could manage, so-called golden ears failed blind listening tests that inserted a 44/16 “bottleneck” into an analog signal path. So that proves that digital is in fact “perfect,” and was perfect even 30 years ago. At least with the better quality professional converters. Today converter chips are so good, and so cheap, that even $30 sound cards are audibly transparent.
Yes, every volume (or other) operation adds a tiny amount of distortion, but it’s really REALLY tiny with modern software that processes using 32 bit FP math. A few years ago I applied 60 sequential volume changes to a pure sine wave created at 32 bits and the only difference on an FFT was stuff down around -150 dB. When I did the same to a 16-bit sine wave the differences were more visible, but still below 120 dB. So sure, dither in theory can reduce digital distortion, from 1/100th the distortion of a typical loudspeaker to slightly less than that. Perspective people! :->)
Finally (sorry), clock jitter is not a problem. It has never been a problem. Not even a tiny problem. Why? Because it’s 120+ dB below the music and thus inaudible. Jitter is noise, or distortion, or both, but it’s so incredibly soft no human can possibly hear it. Reports that jitter affect things like fullness and stereo imaging are silly and trivial to disprove with basic measurements.
Thanks for your perspective, Ethan! No, I don’t disagree one bit. Realize that when I said (20 years ago) that clock jitter is a “problem”, and some other comments, I meant in the engineering sense, a potential problem in recreating audio that has to be considered and solved in the hardware.
As for dither, you might get from my more recent videos that dither (or not) at 16 bits and up won’t be heard. I concede that you might as well dither 16 bits, not because it’s needed in typical recordings, but because you can conceivably stage an unrealistic example that marginally reveals a difference—because dither won’t hurt, you might as well have peace of mind and use it (or not). However, it’s utter folly at 24-bit. This puts me at odds with other audio folks (but sorry, they are wrong).
Yes, you are right that calc’s are done in 32-bit float, but accuracy of that calculation is not the problem. You could do it at 128-bit float and the problem would still be identical: it’s when you truncate the result. But again, it’s an engineering “problem”. It can be a serious problem if you ignore it, if the truncation is in intermediate calculations; even 32-bit float math is often not enough (floating point multiplies always truncate). Not an issue if it’s the final truncation to 16 bit.
Regards,
Nigel
I’m sure we agree on all this stuff, and I didn’t want my many comments to seem like criticism. It’s hard to argue against using dither because, as you say in your video (which I saw just an hour ago), it can’t hurt and might possibly help. The lowered distortion can surely be measured. But I’ve never found a situation where dithering is audible. Yes, if you raise the gain 50 dB as a reverb tail fades to silence you might hear a difference with dither. And for sure you can hear dither on 8-bit content. But in practice, at normal playback volumes, with 16-bit music recorded at a sensible level, I doubt anyone could tell if dither was used. I harp on this because so many people make such ridiculous claims about “obvious” improvements. Or even sillier, that dither affects fullness and imaging, as some people also claim about jitter. People obsess over the stupidest things while ignoring the acoustics of the rooms they mix in. That matters literally 1,000 times more than whether or not you use dither. :->)
A highly respected and credited engineer friend refused to send me some audio for a non-critical demonstration via the internet because transferring digital audio files makes them sound different…thanks for stopping by, Ethan 😉
Nigel, your friend is mistaken regardless of how respected or credited an engineer he might be. Transferring a digital audio file from one person to another does nothing. One might as well posit that downloading software is impossible because doing so mangles the bits and thus renders the software inoperable. We know that is not the case.
Funny how hard it is to convince some people otherwise! Yes, downloading software is a good example. Also, I’ve use the example of financial records, how devastating it would be to the world’s economy if bank data couldn’t be reliably transferred between computers. I’ve found it falls on deaf ears when the person is certain that music sounds different after being transferred…especially when they are a Grammy Award winning engineer.