Is relative phase audible?

Every continuous tone can be represented as the sum of a bunch of sine waves, with various frequencies, relative volumes, and relative phase. I’ve often heard it asserted by knowledgeable people that we humans cannot hear relative phase relationships – that is, that two tones consisting of sine waves of the same frequencies and relative volumes as each other, but with different phase relationships, sound the same. This is based in part on early experiments by Helmholtz, pioneer of much of what we know about the harmonic structure of sounds. The rationale is that the way we hear has to do with tiny “hair cells” in the inner ear, each tuned to a particular frequency, that send out neural signals in proportion to the amount of energy at that frequency. The brain gets a moment-to-moment graph of energy versus frequency, with all the phase info stripped out, so there is simply no way for us to detect phase relationships.

However, just as often I’ve heard it asserted by audio professionals that relative phase is important. We hear about filters (like graphic EQ’s) introducing “phase problems”, or about time-alignment of loudspeaker drivers. If either of these are real problems, it must be because we can hear relative phase. For instance, most filters alter the phase of signals at frequencies near to the cutoff frequency: so, if you take a complex signal with a lot of harmonics and pass it through a filter, you’ll end up changing the phase relationships of the parts of the signal that make it through, even though the volume and frequency relationships haven’t changed. Is that a problem?
It turns out this question is surprisingly easy to answer these days, even though it was hard for Helmholtz. I don’t know whether everyone can hear relative phase, but I sure can.

The experiment
I started by making some raw materials: individual sine waves. I used SoundForge 6.0 to generate 15 seconds of 220Hz sine wave, at -12dBFS. (We’re going to be adding more things to it, so if we start at zero we’d get clipping, and that would skew the results.) Same thing for 440Hz (the second harmonic), 660Hz (third), and 880Hz (the fourth). Then I combined these to make my stimuli. First, I copied the 220Hz sine into a new file, and mixed in (using the Paste Special command) the 440Hz sine at equal volume, the 660Hz sine at -6dB, and the 880Hz sine at -12dB. I saved that as my “in phase” sample. Then, I again copied the 220Hz sine into yet another new file, and mixed in the other sines at the same proportion; however, for each of the other sines, I inverted the signal as I mixed it in. (This is an option in the Mix dialog in SoundForge, but if it hadn’t been I could have just modified the raw materials.) I saved this as my “out of phase” sample.

I should pause here to point out that what I’m talking about has very little to do with “in phase” and “out of phase” in the sense of what happens when you wire up one speaker or one mic connector backwards (better described as “polarity” than “phase”); nor does it have much to do with phase relationships between microphones, like when you have a close and a far mic on something and you get comb filtering because of the time lag between them.

Anyway, the question is, do these two samples sound the same or do they sound different? The only difference is the phase relationships: there’s exactly the same amount of signal energy at exactly the same component frequencies. If you plotted it on a spectrum analyzer the two signals would be identical. Do ears behave like spectrum analyzers, or are they fancier than that?

You can try for yourself. Here’s a 3.3 second excerpt of the in-phase wave, and here’s the same length of the out-of-phase one. They’re in .wav format rather than .mp3, because I don’t trust an MP3 to preserve relative phase – after all, the psychoacousticians say you can’t hear it, so a compression algorithm shouldn’t need to preserve it! (I don’t know, maybe they preserve it anyway; I just don’t trust it.) Play them and listen! Can you hear the difference? My guess is that some people can and some can’t.

Blind trials
But, it’s easy to fool yourself. If you want to be a bit more systematic about it, you might want to try a kinda-blind test. (This is still not perfect; ideally we’d use the computer to do a truly randomized, truly blind test. But it’s better than nothing.) Find a friend, and get them to flip a coin 10 times and write down the outcomes without telling you. Now, put the friend in front of the computer, and instruct them to play first the in-phase .wav, then the inverted one, and then one or the other depending on whether they got a head or a tail. They say nothing during this. You listen, and decide whether the third “mystery” sample was in phase or inverted, and write it down. Same thing, ten times in a row; friend says nothing, throughout. After all ten, compare your judgements with the coin tosses. If you can’t hear the difference, around half your answers will be wrong. If you got nine or ten right, that’s pretty good evidence that you can hear the difference. Have fun! To me they sound quite different; on casual listening my wife thought they sounded the same, but when she paid attention to them she was able to get them right nearly every time.

What next?
The next steps would be to try this with lower and higher frequencies; can I hear relative phase down at 40Hz? If so, I better be careful about how I design the highpass filters that block DC and subsonic rumble from getting through my audio gear, because they introduce phase shifts up around there. Can I hear relative phase at 5kHz and above? If so, then maybe those claims about phase-aligned speakers aren’t just marketing voodoo. And then, how small a phase difference can I hear? In this experiment it was the biggest possible difference, 180 degrees out of phase from the fundamental tone. Can I hear 90 degrees out of phase? 30 degrees? Most filters don’t skew things as radically as they were skewed in this experiment, so perhaps the effect is nothing to worry about.

And hey, maybe my methodology is wrong. If anyone sees a flaw in the way I did the experiment, let me know! I can think of one possible issue, which is that because of the way the harmonics stack up, slew rate distortion could be more of a problem in one sample than the other; I don’t think it is, though, because I’m playing this at low volumes on a system with an M-Audio Delta 2496 sound card through Mackie HR824’s, so I’m well within the limits of the system.