Image depth

Can anyone offer a technical explanation of how a stereo system recreates image depth? Why are some center images behind the speakers, and others in front of the speakers, for example.
Should there be any depth to a mono recording, or should the image be directly in line with the speakers?

Showing 3 responses by roberttdid

While room echoes can help in locating, much of what you experienced is psycho-acoustic (suggestion, coupled with familiarity) unless they used a proper torso simulator for the capture and if my memory serves, that was not done for this test recording.

You’d never hear him behind you if speakers were needed back there. Instead all that’s needed is to faithfully reproduce in detail the full acoustic signature of the event. Especially including the faint room echoes.

Your ears/brain use several things for placement of a sound:
  • The difference in arrival time for approximately the same sound between your two ears gives you angular position. This works best between about 120Hz to about 1500Hz, but predominantly below 800Hz. This is that highly accurate timing you hear assigned to the ears/brain, but keep in mind it is phase difference, not absolute timing.
  • Spectral notches due to head shape provide front/back cues, emphasis on cues. It is not perfectly accurate.
  • Most of depth comes from volume cues
  • Volume cues can also give angular position predominantly at high frequencies, predominantly >1500Hz, but again starting at about 1000Hz. One side of your head shields the other side resulting in a level difference based on frequency.
  • Some filtering of frequencies by your torso and pinna provide some level of height cues, but the ear/brain is not great at height detection.

With most recording techniques, even for live music, most of the angular position information is lost. What you perceive in the recording is artificial, put their by the recording engineer.

There are microphone techniques, both with torso/head simulators and stereo microphones that can capture or theoretically capture the differential timing of what a human would hear. Big However, being able to use that on playback is pretty much limited to headphones, though you may get lucky with speaker placement and the odd recording and extract some of that.

Height information is highly specific to individuals, so capturing it on 2 channel is pretty much impossible, and actually not done.
So what does this mean? Most of the sound-stage, imaging, etc. even in live recordings indicative of the recording and far more influenced by the recording/mixing process. That that end, most of what audiophiles described is "simulated" especially height.

It also means when someone says wall-wall sound-stage, that is probably either hyperbole, or a pleasant, but highly inaccurate representation of the music.
Looking at that picture in Soundstage the speakers are too close to the back wall or window as it may be. That's a very reflective surface to have a speaker in front of.

This seems to be equivalent to moving the speakers farther out into the room. This is usually enhanced with a diffuser panel between the speakers on the front wall.