The “They are here” vs “You are there” sound topic


Hi all,

I want to start a topic about the “They are here” vs “You are there” type of sound. I have read that different audiophiles usually fall in one of either categories, but what does it actually mean? So here a few questions:

- what is the definition of “They are here” vs “You are there” in your opinion?
- what is the main difference in sound? E.g. soundstage
- which kind of sound do you prefer?
- which type of speakers fall in one or the other category in your opinion?
- what type of sources, amplifiers or even cables fall in one or the other category in your opinion?

For instance, I believe the Esoteric products from Japan fall in the they are here type of sound. Do you feel the same?
richardhk

Showing 5 responses by audiokinesis

" Wait, what? Duke hasn’t posted yet??? Must be working on it still then.... " 

Yeah baby!
Thank you millercarbon and bryhifi for your encouargment. Bryhifi, I’m going to continue along the lines of what you quoted.

Apologies in advance for the length of this post. Imo this is a complex topic, and brevity eludes me.

I’m going to start from the assumption that the recording contains a plausible “you are there” acoustic signature. Obviously such is not always the case, but given that the more challinging of the two would be to recreate “you are there” in a home listening room, let’s make the ability to do so our intention.

In the listening room there is, in effect, a "competition" between the acoustic signature of the venue on the recording (whether real or engineered or both), and the acoustic signature of the room we are listening in. Let’s call these the "First Venue" and the "Second Venue", respectively. When the First Venue cues dominate (and all else is good), then "you are there." When the Second Venue cues dominate (and all else is good), then "they are here." In general, getting the First Venue cues to dominate is easier said than done, as the Second Venue cues naturally tend to dominate in most home listening rooms.

So while the end result will inevitably be recording-dependent, let’s give all of our recordings the best chance we reasonably can, by effectively presenting the First Venue cues while disrupting the Second Venue cues.

In order for the First Venue cues to be effectively presented, they need to be strong enough for us to hear them; they need to be easily recognizable by the ear; they need to arrive from many different directions; and they need to not die away too quickly.

The First Venue cues are of course included in the direct sound, but that’s arguably the worst possible direction for reflections to come from. Fortunately they are also included in the reflected energy in the room. The ear/brain system can pick out those First Venue ambience cues from the reflections in the listening room based on their spectral content, and connect them to the appropriate first-arrival sounds. Timbre is also enriched along the way.

First Venue cues "strong enough for us to hear them" means that we need a fair amount of reverberant energy, which implies wide-pattern or polydirectional speakers and/or a room that is not overdamped. The latter helps insure that they "don’t die away too quickly". And the wide/polydirectional pattern + ideally a lot of diffusion = the First Venue cues "arrive from many different directions."

In order for the First Venue cues to be "easily recognizable by the ear", they must be spectrally correct. This implies that the spectral balance of the off-axis energy is similar to the spectral balance of the first-arrival sound, AND that the room doesn’t over-absorb the short wavelengths (high frequencies) and correspondingly degrade the spectral balance of the reverberant energy. Of course we want to avoid slap echo, so there’s a balance we’re looking for, and in general diffusion serves that goal better than absorption.

But imo this is only HALF the battle.

The other half is, we want to weaken and/or disrupt the "Second Venue" cues - that is, the inherent acoustic signature of the listening room.

Undesirable Second Venue "small room signature" is stongly conveyed by the earliest reflections, and in general the earlier their arrival stronger the effect. So we want to avoid early reflections as much as possible; and/or diffuse them such that they are not strong and distinct ("specular"); and/or aborb them uniformly. The latter cannot be accomplished by a few inches of foam, which soaks up the short wavelengths but has little effect on longer ones, and thereby screws up the spectral balance of the reflections. We want the reflections to decay fairly slowly (though not too slowly), as quick decay is another source of "small room signature", which is another reason to use something other than absorption to address the early reflections, where possible.

If we can impose a significant delay on the strong onset of reflections, and push that inrush of reflections back in time somewhat, we can disrupt the "small room signature" cues by introducing contradictory "somewhat larger room" cues. An example of this would be, putting Maggies well out into the room such that it takes a while for the reflections off the wall behind the speakers to reach the listening area. About five feet seems to work well, though greater distance often works better. This relatively late-onset inrush of reverberant energy contradicts the normal "small room signature" cues we would otherwise get. So we end up with relatively indistinct Second Venue cues, which makes it more likely that our effectively-presented First Venue cues will dominate. Thus Maggies and any many other polydirectionals are capable of doing "you are there" rather well with proper set-up, and we easily hear the different "there’s" from one recording to the next, which indicates the First Venue is indeed dominant, rather than merely an enhanced (by the longer reflection paths) Second Venue. With more conventional speakers the same principles apply, including: Minimize the early reflections (via diffusion or angled reflectors or even angled side walls if we’re building a dedicated room) while cultivating the late ones.

Compared with all the cues we’d get in the actual venue, even the best stereo system presents us with a poverty of First Venue cues. The ear takes in all of these different and often contradictory cues and constructs a "best fit" impression of the acoustic space we are in. If we have effectively presented the First Venue cues while minimizing/disrupting/degrading the Second Venue cues, with a good recording that "best fit" may well end up being a reasonable facsimile of the acoustic space of the recording (again, whether real or engineered or both).

I’m not saying this is the ONLY thing that goes into a "you are there"-capable system, but it’s arguably one of the things. And, note that a professional acoustician can make a small room behave like a much larger and much better space. For many of us, the services of a professional acoustican will make the biggest difference between “they are here” and “you are there.”

Imo, ime, ymmv, etc.

Duke

"The "you are there" acoustical signature on the recording is reverb and lots of it."

I agree, assuming the recording is done well.

"Forget about off axis response. Fix the tonality or you will never be happy."

Personally I do place tonality ahead of spatiality on my list of priorities, but tonality was not the topic of this thread. In general I agree with the approach of fixing first that which matters most.

Imo your injunction to "forget about the off axis response" overlooks a vital aspect of tonality: Most of the sound you hear in most rooms started out as off-axis response.

You can EQ the response such that the sum of on-axis + off-axis = the tonality you desire, but if there was a significant spectral discrepancy between the two to begin with then it’s still there, and listening fatigue may arise over time. Let me explain:

The ear/brain system examines each incoming sound to see if it is a new sound or a reflection. It does so by comparing the spectral content to sounds recently stored in a short-term memory. If there is a match, then it’s a reflection and its directional cues are suppressed, but it still contributes to tonality and loudness. If there is no match then it’s a new sound, and a copy goes into short-term memory for comparison with subsequent incoming sounds. This suppression of directional cues from reflections is called the "Precedence effect", and it’s what allows us to reliably determine the direction of a sound source in a reverberant environment... useful for knowing where to look and/or where to run when a predator snaps a twig in the forest.

When there is a significant discrepancy between the spectral content of the initial sound and its reflection, the ear/brain system has to work correspondingly harder to make the correct match. Over time this can tire that portion of the brain and result in listening fatigue, sometimes literally manifesting as a head-ache.

One EQ-based way to minimize these spectral discrepancies might be to use a fairly directional (or possibly nearfield) EQ’d main array and a dedicated, separately-EQ’d reverberant-field-only array. Position and aim the second arrays (one for each channel) such that their outputs arrive after as much path-length-induced time delay as is reasonably feasible.

Duke
Kenjit wrote: "you say that [tonality is your priority] but your focus is and always has been off axis response. That is your holy grail."

My interest in getting the off-axis response right arose from its beneficial effects on timbre. Subsequently I found other worthwhile benefits. So to me, getting the off-axis response right is a means to an end, and that end is multi-faceted, and its main facet is timbre.

(A major difference between live and reproduced sound is, what’s happening in the reverberant field. You can walk past an open doorway with no line-of-sight to the sound source, such that all you can hear is the reverberant sound, and instantly you know whether it’s live or a recording. There are some simplification assumptions in this statement, but not all that many.)

" Cheap dsp speakers costing a few hundred bucks can give you smooth perfect off axis response if thats the goal. "

If there is a significant discrepancy between the on-axis response and the off-axis response, that discrepancy shows up in the shape of the radiation pattern. EQ cannot correct the radiation pattern shape, so it cannot simultaneously correct the on-axis and off-axis response if the radiation pattern has problems. (When the radiation pattern does not have problems, on-axis and off-axis issues ARE simultaneously corrected, which is what makes such speakers good candidates for DSP.)

So smooth perfect off-axis response is not THE goal. It is one of many.

If you are able to achieve satisfactory results with cheap DSP speakers, congratulations!  If you are willing to share the secrets of your success, even better. 

"My definition of tonality is completely different than yours. Tonality, according to my definition is the area well below the crossover point. So it has nothing to do with off axis response."

In that case, our use of the term is too different for us to move forward with its use. Do you accept the dictionary definition of "timbre"?

Duke