One ugly thing that we need to be aware of, especially for filters, is the de-normalization of numbers. Basically, computer processors try to keep floating point numbers normalized—they try to keep them in the binary form 1.xxxxxx (where each x is 0 or 1) times 2 raised to a positive or negative power. When numbers become extremely small—smaller than can be expressed by the largest negative exponent—the processor tries to maintain precision by allowing the number to become de-normalized—to slip to 0.xxxxxx times 2 raised to the lowest exponent, in order to allow smaller but less precise numbers. The catch is that most processors become much slower when performing computations with de-normalized numbers (“denormals”).
Denormals happen when you multiply a small number by a tiny number. Typically, denormals are created in filters that ring out with no new input. Almost any new input, including background noise when recording, will keep denormals from appearing. However, if you process some sound, then follow it with samples of zero, a filter or reverb can decay to denormals; the denormals are then passed to the next processing block and cause it to slow as well.
Some modern processors have specific instructions to treat denormals as zero, solving our problem—if your code enforces that setting. You can repair a denormal by checking flags in a floating point number and substituting zero. But that’s a lot of work. A simple way to avoid denormals is to add a tiny but normalized number to calculations that are at risk—1E-18, for instance is -200 dB down from unity, and won’t affect audio paths, but when added to denormals will result in 1E-18; this won’t affect audible samples at all, due to the way floating point works.
The Pentium 4 in particular becomes devastatingly slow—so slow that it can’t keep up with real time audio processing. For audio purposes, denormals might as well be zero. They are so many dB down from unity that their only effect is the potential of bogging down our processing.
While denormal protection could be built into the filter code, it’s more economical to use denormal protection on a more global level, instead of in each module. As an example of one possible solution, you could add a tiny constant (1E-18) to every incoming sample at the beginning of your processing chain, switching the sign of the constant (to -1E-18) after every buffer you process to avoid the possibility of a highpass filter in the chain removing the constant offset.
The best solution depends on your exact circumstances, so you should make a point of understanding the threat. For instance, in the biquad code, there is a potential to generate denormals in the calculations of z1 and z2; if the filter input goes to zero, the output alone recirculates through z1 and z2, potentially getting closer and closer to zero until the values become de-normalized and continue to recirculate in slower de-normalized calculations, as well as passing to the next processing block and slowing its calculations.
Lest you think that you need only be concerned about denormals when fine-tuning your code, take a look at the performance hit for various processors (special thanks to RCL for permission to reproduce his chart):
Yes, that’s more than 100 times slower execution of test code on a Pentium 4 when the calculations become denormalized. I hit this one myself, years ago: After working on DSP chips, primarily, which are unaffected by this issue, I was developing a commercial product for Mac and PC that ran on the host CPU. I did the initial development on a Power PC processor, then gave it a run on my PC—a P4. Audio processing came to a dead stop. Fortunately, I was aware of the issue with denormals, and went straight to a solution that fixed it.
Please read RCL’s excellent article on denormals, “Denormal floats across architectures“, in which he goes into more detail, including comparisons of various CPU architectures.
Another variant in the x86/amd64 case is to set the FPU flush denormals to zero bit for the thread handling audio data. I’ve had an implementation with lots of parallell biquads that prior to that suffered from denormals, but with the bit set, works like a charm. The advantage is of course that you do not need to fiddle with the de-denormal constant.