Image depth


Can anyone offer a technical explanation of how a stereo system recreates image depth? Why are some center images behind the speakers, and others in front of the speakers, for example.
Should there be any depth to a mono recording, or should the image be directly in line with the speakers?
cakids
Furthermore, it seems to me that a few microphones, no matter where they are positioned during recording, cannot possibly duplicate a music waveform’s ampliture and phase exactly as it would occur at a live listener’s ears. So how can an accurate soundstage be created?
Post removed 
Do an internet search using the phrase "stereo recording techniques".  There are a vast number of articles describing different microphone placement and their impact on stereo sound reproduction.
Can anyone offer a technical explanation of how a stereo system recreates image depth?

As far as the recordings go, leave that for others, but how to get your system to extract the most do this.

 Remove anything between the speakers (equipment racks ect) and you’ll increase the depth perspective as your ears and eyes hear and see it.

Then one "BIG" step further is to remove the back wall from in between the speakers like I did, leaving a little 1-2mt behind each speaker for bass loading, and then hear and see your image depth go back much further to the back wall of the next room.

Cheers George
What I’m getting from reading about recording techniques, is that godd recording techniques will fool the ear-brain location finding function to create an approximate aural image, but does not actually duplicate the exact sonic signature (amplitude, phase) that would impinge on the ears during a live performance. The exception is binaural recording, which puts the mikes on a simulated human head.
Your ears/brain use several things for placement of a sound:
  • The difference in arrival time for approximately the same sound between your two ears gives you angular position. This works best between about 120Hz to about 1500Hz, but predominantly below 800Hz. This is that highly accurate timing you hear assigned to the ears/brain, but keep in mind it is phase difference, not absolute timing.
  • Spectral notches due to head shape provide front/back cues, emphasis on cues. It is not perfectly accurate.
  • Most of depth comes from volume cues
  • Volume cues can also give angular position predominantly at high frequencies, predominantly >1500Hz, but again starting at about 1000Hz. One side of your head shields the other side resulting in a level difference based on frequency.
  • Some filtering of frequencies by your torso and pinna provide some level of height cues, but the ear/brain is not great at height detection.

With most recording techniques, even for live music, most of the angular position information is lost. What you perceive in the recording is artificial, put their by the recording engineer.

There are microphone techniques, both with torso/head simulators and stereo microphones that can capture or theoretically capture the differential timing of what a human would hear. Big However, being able to use that on playback is pretty much limited to headphones, though you may get lucky with speaker placement and the odd recording and extract some of that.

Height information is highly specific to individuals, so capturing it on 2 channel is pretty much impossible, and actually not done.
So what does this mean? Most of the sound-stage, imaging, etc. even in live recordings indicative of the recording and far more influenced by the recording/mixing process. That that end, most of what audiophiles described is "simulated" especially height.

It also means when someone says wall-wall sound-stage, that is probably either hyperbole, or a pleasant, but highly inaccurate representation of the music.
What I’m getting from reading about recording techniques, is that godd recording techniques will fool the ear-brain location finding function to create an approximate aural image, but does not actually duplicate the exact sonic signature (amplitude, phase) that would impinge on the ears during a live performance.


Right. It doesn’t have to be the exact same information. It just has to be enough of the essential information.

The (highly recommended) XLO Test CD has a wonderful track where Roger Skoff simply describes the room dimensions and microphone placement and where is in the room. As he’s talking you realize what you are hearing is exactly as if you are there in the room. Not your room, his. He walks and talks and occasionally hits a clavis (wood block) letting you hear the acoustic signature of the room. At one point he walks to the extreme back of the room and hits the clavis. It sounds as if he is behind you.

How technically is it possible for two speakers in front of you to create the illusion not only of depth front to back but even behind?

The answer is the sound from further away, its not just that there’s a time difference between the direct and reflected sounds, its that there is also a frequency and volume difference as well. Everything about the sound changes depending on where in the recording space the sound originates. That is the lesson of the XLO track. You’d never hear him behind you if speakers were needed back there. Instead all that’s needed is to faithfully reproduce in detail the full acoustic signature of the event. Especially including the faint room echoes.

The speakers must of course be symmetrical and equidistant. Matters more L to R than front to back. I’m simplifying, obviously. But really, its the spacial information captured in the acoustic signature that does the trick more than anything else.
While room echoes can help in locating, much of what you experienced is psycho-acoustic (suggestion, coupled with familiarity) unless they used a proper torso simulator for the capture and if my memory serves, that was not done for this test recording.


You’d never hear him behind you if speakers were needed back there. Instead all that’s needed is to faithfully reproduce in detail the full acoustic signature of the event. Especially including the faint room echoes.

"Then one "BIG" step further is to remove the back wall from in between the speakers like I did, leaving a little 1-2mt behind each speaker for bass loading"

Awesome George!
Thank you all for filling in a lot of holes in my understanding. I can remember my first experience of depth and soundstage size. It was in a small hotel room at a NY show years ago. The walls disappeared and there was a live orchestra stretching about 40 or 50 feet behind the speakers. Happened to be Swans speakers and Boulder electronics. I know that images can be behind or in front of the speakers, but didn’t have quite as clear a technical understanding of what made it happen.
"Then one "BIG" step further is to remove the back wall from in between the speakers like I did, leaving a little 1-2mt behind each speaker for bass loading" Awesome George!
It gave me double the depth and imaging that I had before with the wall in place.
Here is a rough drawing of it. https://ibb.co/9g5VW5W (when it opens click it again to enlarge it) , many in Australia have copied it after they’ve heard/seen what kind of image and depth it presents.

The speakers are ML Monolith III’s with the much better Neolith ESL panels, 12" bass drivers are much better than the stock one

And later if you need to sell the house, you just put in 2 x double fold away doors that fold back against each of the 2 short walls.

If you have a look my editor/reviewer friend of Soundstage he did similar with his after hearing mine with his Alexia’s, installing a big bay window between the speakers to give more distance, (the equipment rack is just in between for picture purposes and lives on the side out of site).
https://forum.audiogon.com/discussions/wilson-alexia-2-3

Cheers George

Looking at that picture in Soundstage the speakers are too close to the back wall or window as it may be. That's a very reflective surface to have a speaker in front of.

This seems to be equivalent to moving the speakers farther out into the room. This is usually enhanced with a diffuser panel between the speakers on the front wall.
Interesting read on the topic:

https://ohmspeaker.com/news/are-we-out-of-depth/

https://en.m.wikipedia.org/wiki/Hafler_circuit

I have heard highly customized dealer showrooms set up to deliver realistic imaging depth with appropriate-orchestral recordings, meaning specific player locations in a large 3-d area could be identified exactly, with a 2 channel system, but few if any rooms in people’s homes are designed for that.

Rectangular rooms or anything like that are the limitation. 

Same setup by same dealer at a show in a more conventional room: Not so much....more like all the rest.
I just completed most of my mods/upgrades in my system. The list is too long to describe. 
New high tech  mods will affect sound fq resolution = depth. 
Ck all my YT uploads , espec my latest , as now my sound has depth.  That is fq resolution/instrumental separations.

Depth is real, 
What before was FLAT,,now has depth. 
Yeah cost mea  bunch, but well worth it. 
Most stock components  need mods to acheive depth. 
A amp lab is not going to put in high price/high tech parts, Just not, Never did , never will. 
I find that soundstage size is a direct function of the placement, but imo the equipment/electronics dominate the matter of depth of soundstage. 

One of the variables that influence depth of soundstage is the sense of width. I use the PureAudioProject Trio15 Horn 1 Speakers (see my reviews at Dagogo.com on the variants of this speaker) both Portrait (vertical) and Landscape (horizontal). The soundstage changes dramatically with these two options. But, still, the biggest impact on depth is the electronics and cables. 
I'm not sure this is relevant to your question but it is interesting in how people differ in perception of music and speech. 

http://deutsch.ucsd.edu/psychology/pages.php?i=201#Technical.php
This is a great article on the subject by Siegfried Linkwitz... jump to section 3 if you're impatient and 3.2.1 for his '5 requirements'