Oversampling

In this discussion, “oversampling” means oversampling on output—at the digital to analog conversion stage. There is also a technique for oversampling at the input (analog to digital) stage, but it is not nearly as interesting, and in fact is unrelated to oversampling as discussed here.

Motivation for oversampling

Most people have heard the term “oversampling” applied to digital audio devices. While it’s intuitive that sampling and playing back something at a higher rate sounds better than a lower rate—more points in the waveform for increased accuracy—that’s not what oversampling means.

In fact, the truth is much less intuitive: Oversampling means generating more samples from a waveform that has already been digitally recorded! How can we get more samples out than was recorded?!

For background, let’s look at the “classic” digital audio playback system, the Compact Disc: The digital audio samples—numbers—are sent at 44.1 KHz, the rate at which they were recorded, to a low-pass filter. By Nyquist’s Theorem, the highest frequency we can play back is less than half the recorded rate, so the upper limit is 22.05 KHz. Everything above that is aliased frequency components—where the audio “reflects” around the sampling frequency and its multiples like a hall of mirrors. The low-pass filter, also called a reconstruction filter or anti-aliasing filter, is there to block the reflections and let the true signal pass.

One problem with this is that, ideally, we want to block everything above the Nyquist frequency (22.05 KHz), but let everything below it pass unaffected. Filters aren’t perfect, though. They have a finite slope as they begin attenuating frequencies, so we have to compromise. If we can’t keep 22 KHz while blocking everything above it, we’d certainly like to shoot for 20 KHz. That means the low-pass filter’s cutoff must go from about 0 dB attenuation at 20 KHz to something like 90 dB at 22 KHz—a very steep slope.

While we can do this in an analog filter, it’s not easy. Filter components must be very precise. Even so, a filter this steep has a great deal of phase shift as it nears the cut-off point. Besides the expense of the filter, many people agree that the phase distortion of the upper audio frequencies is not a good thing.

Now, what if we had sampled at a higher rate to begin with? That would let us get away with a cheaper and more gentle output filter. Why? Since the reflections are wrapped at the sampling frequency and its multiples, moving the sampling frequency that far up moves the reflected image far from the audio portion we want to preserve. We don’t need to record higher frequencies—the low-pass filter will get rid of them anyway—but simply having more samples of our audio signal would be a big help.

This is where interpolation comes in. We calculate what it would look like if we had sampled with more points to begin with. If we could have, for instance, eight times as many sample points running at eight times the rate (“8X oversampling”), we could use a very gentle filter, because instead of 2 KHz of room to get the job done, we’d have 158 KHz.

In practice, we do exactly this, following it with a phase linear digital “FIR” (finite-impulse response) filter, and a gentle and simple (and cheap) analog low-pass filter. If you buy the fact that giving ourselves more room to weed out the reflections—the alias components—solves our problems, then the only part that needs some serious explaining is…

Where do the extra samples come from?

First, lets note that in the analog domain, the sampling rate is essentially infinite—the waveform is continuous, not a series of snapshots as with a digitize waveform. So, you could say that the low-pass reconstruction filter converts from the output sampling rate to an infinitely high sampling rate. It’s easy to see that we could sample the output of the low-pass filter at a higher rate to increase the sampling rate. In fact, since we don’t need to convert to the analog domain at this point, we could simply use a digital low-pass filter to reconstruct the digital waveform at a higher sampling rate directly.

Interpolating filters

There is more than one way to make a digital low-pass filter that will do the job. We have two basic classes of filters to choose from. One is called an IIR (infinite impulse response), which is based on feedback and is similar in principle to an analog low-pass filter. This type of filter can be very easy to construct and computationally inexpensive (few multiply-adds per sample), but has the drawback of phase shift. This is not a fatal flaw—analog filters have the same problem—but the other type of digital filter avoids the phase shift problem. (IIR filters can be made with zero relative phase shift, but it greatly increases complexity.)

FIR filters are phase linear, and it’s relatively easy to create any response. (In fact, you can create an FIR filter that has a response equal to a huge cathedral for impressive and accurate reverb.) The drawback (starting to get the idea that everything has a trade-off?) is that the more complex the response (steep cut-off slope, for instance), the more computation required by the filter. (And yes, unfortunately our “cathedral” would require an enormous number of computations, and in fact digital reverbs of today don’t work this way.)

Fortunately, we need only a gentle cut-off slope, and an FIR will handle that easily.

An FIR is a simple structure—basically a tapped delay line, where the taps are multiplied by coefficients and summed for the output. The two variables are the number of taps, and the values of the coefficients. The number of taps is based on a compromise between the number of coefficients we need to produce the desired result, and the number we can tolerate (since each coefficient requires a multiplication and addition).

How do we know what numbers to use to yield the desired result? Conveniently, the coefficients are equivalent to the impulse response of the filter we’re trying to emulate.

So, we need to fill the coefficients with the impulse response of a low-pass filter. The impulse response of a low-pass filter is described by (sine(x))/x. If you plot this function, you’ll see that it’s basically a sine wave that has full amplitude at time 0, and decays in both directs as it extends to positive and negative infinity.

If you’ve been following closely, you’ll notice that we have a problem. The number of computations for an FIR filters is proportional to the number of coefficients, and here we have a function for the coefficients that is infinite. This is where the “compromise” part comes in.

If we truncate the series around zero—simply throwing away “extra” coefficients at some point—we still get a low-pass filter, though not one with perfect cut-off slope (or ripple in the “stop band”). After all, the sin(x)/x function emulates a perfect low-pass filter—a brick wall. Fortunately, we don’t need a perfect one, and our budget version will do. We also use some math tricks—artificially tapering the response off, even quickly, gives much better results than simply truncating. This technique is called “windowing”, or multiplying by a window function.

As a bonus, we can take advantage of the FIR to fix some other minor problems with the signal. For instance, Nyquist promised perfect reconstruction in an ideal mathematical world, not in our more practical electronic circuits. Besides the lack of an ideal low-pass filter that’s been covered here, there’s the fact we’re working with a stair-step shaped output before the filter—not an ideal series of impulses. This gives a little frequency droop—a gentle roll off. We can simply superimpose a complementary response on the coefficients and fix the droop for “free”.

While we’re at it, we can use the additional bits gained from the multiplies to help in noise shaping—moving some of the in-band noise up to the frequencies that will be removed later by the low-pass filter, and to frequencies the ear is less sensitive to.

More cool math tricks to give us better sound!

Posted in Aliasing, Digital Audio, Sample Rate Conversion | 4 Comments

Digital audio: theory and reality

The promise of perfect audio—the Nyquist Theorem

Most people who’ve look at digital audio before know about the Nyquist theorem—if you sample an analog signal at a rate of at least twice its highest frequency component, you can convert it back to analog, passing through a low-pass filter, and get back the same thing you put in. Exactly. Perfectly.

sampling image

The real world

In the real world, though, many people argue that analog “sounds better.” How can this be, if digital audio is perfect?

For one thing, we’ve grown to like some of the deficiencies of analog recording. Just as tube amplifiers give a more pleasant distortion and compression to musical signals than transistors, analog tape similarly warms up and fattens the sound.

Of course, this alone isn’t a reason to forsake digital’s many conveniences. We can always use other means, such as tube compressors, to fatten the sound if needed. The real problems lie with the real-world problems Nyquist didn’t warn us about.

First, there is no such thing as the perfect low-pass filter required by Nyquist’s theorem. A real filter has a finite slope, so we need to set its cut-off a little lower than theory. Also, a steep filter has a lot of phase shift near and above the cutoff. And some aliasing is bound to leak through at the very high end. A technique called oversampling has been developed to reduce these problems.

Another big problem is finite wordlength effects—we’re using 16-bit samples, not the pure numbers of the Nyquist theorem, so we have to compromise the sample values. To start, 16 bits is not as great as it seems. Yes, it translates into 96 dB dynamic range, but that’s an absolute ceiling—you can’t go any higher. So, the average music level must be much lower in order to allow headroom for peaks. And, at the low amplitude end, distortion of small-signal components is very high, contributing to the “brittle” sound that many people describe with digital audio. On top of this, any gain change (from mixing tracks or changing volumes) causes individual samples to be rounded to the nearest bit level, adding distortion. Fortunately, a technique called dithering relieves these problems.

Clock jitter is another problem. If the sample clock timing is not perfect, it creates another kind of distortion. For a self-contained unit, the solution is simply more accurate timing; reducing timing errors reduces the distortion to a negligible level. When digitally interfacing with other units, though, the issue becomes a little more complex, but is not a problem when handled correctly.

Finally, an often overlooked detail in digital audio discussion is that Nyquist’s samples are instantaneous values—impulses. Our digital systems generally output stairsteps to the convertor and low-pass filter, holding the current sample level until the next. This causes a frequency droop and loss of highs—impulses carry more high-frequency energy than stairsteps. The solution is not to produce impulses—which are impossible to produce perfectly—but to simply adjust the frequency response with filtering. Fortunately, it’s trivial to add this adjustment to an oversampling filter.

Posted in Aliasing, Digital Audio, Dither, Jitter, Phase | 12 Comments

MIDI overview

This chapter presents a brief overview of the Musical Instrument Digital Interface—MIDI. You should also have a more detailed reference on the subject, especially if you need to understand advanced features not covered here, such as MIDI Time Code and Sample Dump Standard.

Introduction

The MIDI specification details a combination of hardware and software, enabling synthesizers, computers, effects, and other MIDI devices to communicate with each other. Communication may be one-way (sending or receiving) or two-way (sending and receiving). For instance, a simple effects processor might have only MIDI input, to allow remote MIDI selection of program number. Synthesizers usually have MIDI input and output. They can receive requests to play notes from other keyboards or from a computer, and they can send notes played on the unit’s own keyboard. Program changes and actual program information can be sent and received.

Numbers and conventions

Often, MIDI documentation refers to number values in decimal, hexadecimal (often called hex), or binary, as is convenient. Tables often denote MIDI bytes as binary, such as 1011nnnn or 0vvvvvvv. Otherwise, if not noted or obvious, assume decimal. Hexadecimal is used as a shorthand for binary, usually preceded by a dollar sign ($)—as in this text—or followed by an H. (For instance, $7E and 7EH stand for hexadecimal 7E.)

MIDI hardware interface

The MIDI interface operates at 31.25 Kbaud, which works out to 320 microseconds per byte. Since most MIDI messages consist of two or three bytes, this means that it takes less than a millisecond to send a MIDI command.

The serial data is transferred in a current loop configuration. Many devices have a MIDI thru, which simply passes the MIDI input. You may use these to daisy-chain MIDI devices, but a chain of three devices is the practical limit, since each thru adds timing distortion to the MIDI signal, making it difficult for the receiver to interpret the data correctly. Y-cords are not appropriate for either splitting or combining MIDI data. You must use MIDI thru boxes to distribute, and mergers to combine MIDI streams.

Proper MIDI cables are made from shielded twisted pair cable, and should be a maximum length of 50 feet (15 meters). (Beyond using quality built MIDI cables, there is no advantage to using expensive or esoteric cables. They have no effect on the MIDI transfer or the sound quality of your instrument.)

As a final hardware note, the thoughtful folks that brought us MIDI deemed that the connections would be opto-isolated. This eliminates the possibility of ground loops through the MIDI cables. Also, you will not harm your MIDI ports if you accidentally plug an output into another output (but it won’t do anything interesting either).

MIDI data format

MIDI communications happen through multibyte messages consisting of one status byte, optionally followed by one or two data bytes, except for system exclusive messages, which have an arbitrary number of data bytes. Status bytes have their most significant bit (MSB) set to differentiate them from data bytes, so status bytes range in value from 128 ($80) to 255 ($FF), while data bytes range from 0 to 127 ($7F).

MIDI supports 16 message channels, letting you link multiple devices while maintaining individual control. Messages sent on specific channels, such as note on and note off, are called channel messages. Messages that are not channel oriented are called system messages. See Table 1 at the end of this chapter for a summary of MIDI messages.

Channel messages

Channel messages contain their channel number in the lower four bits of the status byte. A value of 0 corresponds to channel 1, 1 to channel 2, and so on, up to a value of 15 (for MIDI channel 16). When status bytes are listed as 1011nnnn (binary), the nnnn part refers to the channel part of the status byte. Similarly, in $Bn, the n refers to the channel part, in hexadecimal.

There are two types of channel messages: mode and voice. Mode messages are used to control the polyphony of a synthesizer, and to send all notes off commands. Voice messages are those that control a particular synthesizer voice on a particular channel.

Mode

MIDI allows for several variations in assigning voices to the 16 MIDI channels. These variations are controlled by channel mode messages. The status byte for channel mode messages is the same as for control change messages (a channel voice message). The two are differentiated by the data byte that follows, which is 0-120 for controllers and 121-127 for mode messages.

The mode messages give you control over whether omni is on or off, and whether the unit is responding in poly (voices assigned polyphonically) or mono (voices assigned monophonically) mode. Omni determines whether the device is responding to voice messages on a given channel (omni off), or to voice messages on all channels (omni on). These messages carry an implicit all notes off command. A separate all notes off mode message is also available.

Some modes let a device respond to more than one MIDI channel at a time. Mode messages are recognized by a receiver only when sent on the basic channel to which the receiver is assigned, regardless of the current mode. Since the modes implemented by a MIDI device are dependent on the actual hardware design, refer to your manual to get a more complete description.

Voice

Voice messages may be received on the basic channel and on other channels—all called voice channels—that are related specifically to the basic channel, depending on which mode has been selected.

Voice messages include all the messages that affect a specific instrument voice, such and note on and note off, pitch bend, modulation, aftertouch, and program number.

System messages

System messages are not encoded with channel numbers. There are three types of system messages: common, real-time, and exclusive.

Common

System common messages are intended for all units in a system, and include such messages as song select and tune request.

Real-Time

System real-time messages consist of a single status byte, and are used for timing and start/stop information. Real-time messages may be interspersed in the MIDI data stream, even within a multibyte message, without affecting the current status. Real-time messages are usually intercepted or generated at the MIDI driver level and used for timing information (when clocking externally, for instance); generally, you will not have to deal with these directly.

Exclusive

System exclusive (or sysex) messages are used to transfer information that may be specific to a given MIDI device. Generally, the actual data that is used describe a sound (usually called a program or patch) is not usable by another device, even from the same manufacturer. This is because the sound generating architecture varies dramatically between devices.

System exclusive messages begin with the system exclusive status byte (240, or $F0), followed by a manufacturer’s ID code. The number of data bytes that follow are determined by the manufacturer. Finally, the message is terminated by an end of exclusive (EOX) status byte (247, or $F7). So as not to get stuck reading an endless system exclusive message if the EOX is missing, the MIDI specification states that any status byte (other than real-time) acts to terminate a system exclusive message.

If you want to write a device editor or librarian stack, you will be primarily concerned with system exclusive messages. The device’s maker specifies its system exclusive format. Some manufacturers include a detailed system exclusive specification with each unit they sell. Others requires that you contact them directly to request system exclusive documentation for the device.

System exclusive messages usually get sent as a result of requesting them, either by sending a system exclusive message to your device requesting a patch dump, or by a front panel invocation. As with all MIDI messages, if you receive a system exclusive message that you don’t understand or are not interested in, simply ignore it and all associated data bytes.

A final note on system exclusive: Since this is the most flexible form of MIDI message, you might expect that this is where extensions to the MIDI specification would take place. Well, extensions have already been added here, with certain MIDI Time Code messages—which help to marry MIDI with SMPTE Time Code—and with the Sample Dump Standard format.

Additional status notes

Here are some notes on special status conditions and messages.

Running status

Channel messages (voice and mode) can have running status. That is, if the next channel status byte is the same as the last, it may be omitted. The receiver assumes that the accompanying data is of the same status as was last sent. Receipt of any other status byte except real-time terminates running status.

Running status is especially convenient for sending strings of note-on and note-off messages, when using “note on with velocity of 0” for note off, and for output of continuous controllers. This allows you to cut the length of such strings by one-third.

Undefined and unimplemented status

Undefined status bytes are reserved and should not be used. Any undefined or unimplemented status bytes received should be ignored. Any subsequent data bytes should be ignored until the next legal status byte is received. In this way, these unused status bytes can be added to the MIDI specification in the future without breaking your program.

Table 1

MIDI byte value summary

Message Hex Decimal Data byte count
data 00-7F 0-127 na
Channel messages
Note off 8n 128+n 2
Note on 9n 144+n 2
Polyphonic key pressure An 160+n 2
Control/Mode change Bn 176+n 2
Program change Cn 192+n 1
Monophonic channel pressure Dn 208+n 1
Pitch bend change En 224+n 2
System exclusive
System exclusive status F0 240 variable
System common
MIDI Time Code (MTC) F1 241 1
Song position pointer F2 242 2
Song select F3 243 1
(Undefined) F4 244 0
Cable select* F5 245 1
Tune request F6 246 0
End of exclusive (EOX) F7 247 0
System real-time
Timing clock F8 248 0
(Undefined) F9 249 0
Start FA 250 0
Continue FB 251 0
Stop FC 252 0
(Undefined) FD 253 0
Active sense FE 254 0
System reset FF 255 0

Note: n is the channel number – 1 (0 is channel 1, 1 is channel 2, …).

* Though officially undefined, some MIDI interfaces use this message to control cable access; a single data byte that follows designates the cable number on which subsequent MIDI messages are routed.

Posted in MIDI | Leave a comment

A Java MIDI reference





If you had a Java-equiped browser, you’d see as applet here that looks like this.

Choose the MIDI message type from the message pop-up menu to display the corresponding MIDI message bytes. For channel messages, you can choose the MIDI channel from the channel pop-up menu. You can view the MIDI message bytes in decimal, hexadecimal, or binary.

Posted in MIDI, Widgets | Leave a comment