speakers for 24/96 audio


is it correct to assume that 24/96 audio would be indistinguishable from cd quality when listened to with speakers with a 20khz 3db and rapid hi frequency roll-off?

Or more precisely, that the only benefit comes from the shift from 16 to 24 bit, not the increased sample rate, as they higher freq content is filtered out anyhow?

related to this, which advice would you have for sub $5k speakerset with good higher freq capabilities for 24/96 audio?

thanks!
mizuno

Showing 4 responses by bryoncunningham


Hi Al and Shadorne.

Thanks for your thoughtful responses. Everything you guys said makes sense to me, but I do have some additional thoughts...

07-04-11: Almarg
1)He has apparently established that listeners can reliably detect the difference between a single arrival of a specific waveform, and two arrivals of that waveform that are separated by a very small number of microseconds. I have difficulty envisioning a logical connection between that finding, though, and the need for hi rez sample rates. There may very well be one, but I don’t see it.

I believe Kuncher addresses this in this document, in which he says:

For CD, the sampling period is 1/44100 ~ 23 microseconds and the Nyquist frequency fN for this is 22.05 kHz. Frequencies above fN must be removed by anti-alias/low-pass filtering to avoid aliasing. While oversampling and other techniques may be used at one stage or another, the final 44.1 kHz sampled digital data should have no content above fN. If there are two sharp peaks in sound pressure separated by 5 microseconds (which was the threshold upper bound determined in our experiments), they will merge together and the essential feature (the presence of two distinct peaks rather than one blurry blob) is destroyed. There is no ambiguity about this and no number of vertical bits or DSP can fix this. Hence the temporal resolution of the CD is inadequate for delivering the essence of the acoustic signal (2 distinct peaks).

In essence, I understand him to be saying that the temporal resolution of human hearing is around 6μs. But the temporal resolution of the 44.1 sampling rate is around 11μs. Since the temporal resolution of human hearing is better than the temporal resolution of 44.1 recordings, those recordings fail to accurately represent very brief signals that are both audible and musically significant. For example, Kunchur says:

In the time domain, it has been demonstrated that several instruments (xylophone, trumpet, snare drum, and cymbals) have extremely steep onsets such that their full signal levels, exceeding 120 dB SPL, are attained in under 10 μs…

He also suggests that the temporal resolution of 44.1 recordings might be inadequate to fully represent the reverberation of the live event:

A transient sound produces a cascade of reflections whose frequency of incidence upon a listener grows with the square of time; the rate of arrival of these reflections dN/dt ≈ 4πc3t2/V (where V is the room volume) approaches once every 5 μs after one second for a 2500 m3 room [2]. Hence an accuracy of reproduction in the microsecond range is necessary to preserve the original acoustic environment’s reverberation.

I’m not saying that these claims are true. I’m just trying to give you my understanding of Kunchur’s claims about the connection between human temporal resolution and the need for sampling rates higher than 44.1.

07-04-11: Almarg
2)By his logic a large electrostatic or other planar speaker should hardly be able to work in a reasonable manner, much less be able to provide good reproduction of high speed transients, due to the widely differing path lengths from different parts of the panel to the listener’s ears. Yet clean, accurate, subjectively "fast" transient response, as well as overall coherence, are major strengths of electrostatic speakers. The reasons are fairly obvious – very light moving mass, that can start and stop quickly and follow the input waveform accurately; no crossover, or at most a crossover at low frequencies in the case of electrostatic/dynamic hybrids; freedom from cone breakup, resonances, cabinet effects, etc. So it would seem that the multiple arrival time issue he appears to have established as being detectable under certain idealized conditions can’t be said on the basis of his paper to have much if any audible significance in typical listening situations.

I think perhaps Kunchur does his own view a disservice by emphasizing the deleterious time-domain effects of speaker drivers with large surface areas, e.g. electrostatic speakers. It seems to me that those deleterious effects might be offset to a large extent by the very characteristics you mention, viz., light mass, minimalistic crossover, etc.. But your objection does seem to cast doubt on the significance of the very brief time scales that Kunchur contends are audibly significant.

Having said that, the putative facts about jitter bear on this point in a somewhat paradoxical way. According to some authorities, such as Steve Nugent, jitter is audible at a time scale of PICOseconds. For example, Steve writes:

In my own reference system I have made improvements that I know for a fact did not reduce the jitter more than one or two nanoseconds, and yet the improvement was clearly audible. There is a growing set of anecdotal evidence that indicates that some jitter spectra may be audible well below 1 nanosecond.

That passage is from an article in PFO, which I know you are familiar with. I bring it up, not to defend Kunchur’s claims, but to raise another question that puzzles me:

If jitter really is audible at the order of PICOseconds, does that increase the plausibility of Kunchur’s claim that alterations in a signal at the order of a few MICROseconds are audible?

Again, I don’t quite know how to make sense of all this. I’d be interested to hear your thoughts.

Bryon
This has been a very interesting thread, and I've learned a lot. I have a question that bears on the value of high resolution audio formats, particularly the value of sampling rates higher than 44.1. Here is the question:

Is the preference for high resolution audio formats (24/96, 24/192, etc.) partly attributable to the fact that those formats have better temporal resolution?

I don't know the answer to this question, but it's been on my mind since reading a number of papers with passages like this:

It has also been noted that listeners prefer higher sampling rates (e.g., 96 kHz) than the 44.1 kHz of the digital compact disk, even though the 22 kHz Nyquist frequency of the latter already exceeds the nominal single-tone high-frequency hearing limit fmax∼18 kHz. These qualitative and anecdotal observations point to the possibility that human hearing may be sensitive to temporal errors, τ, that are shorter than the reciprocal of the limiting angular frequency [2πfmax]−1 ≈ 9 μs, thus necessitating bandwidths in audio equipment that are much higher than fmax in order to preserve fidelity.

That quote is from a paper by Milind Kunchur, a researcher on auditory temporal resolution. More can be read in this article from HIFI Critic. Kunchur's research is somewhat controversial, but I have found a number of other peer reviewed papers that seem to confirm that the limits of human temporal resolution is quite low, on the order of MICROseconds.

If that is true, then part of the advantage of high resolution audio formats might be the fact that they have superior temporal resolution, thereby providing more information about very short alterations in the music, i.e., transients. Or so the argument goes.

Anyone have an opinion about this?

Bryon
07-05-11: Almarg
...we are not hearing the nanoseconds or picoseconds of timing error itself. What we are hearing are the spectral components corresponding to the FLUCTUATION in timing among different clock periods...

That's what I suspected, Al, but I wasn't sure.

And thanks for your explanation of jitter. I was aware that jitter resulted in frequency modulation, but I didn't know that it was a kind of intermodulation distortion. Your explanation is much appreciated.

Shadorne - You may be right that Kunchur's methodology is flawed. I've read a few other experiments on human temporal resolution with similar methodologies, but my memory of them is a little vague. In any case, I have a question about your observation that "Some sample rates are noted for being better than others for reducing audible jitter." I'd be interested to hear a technical explanation for why that is the case.

Finally, I have a general question about high resolution audio that anyone might be able to answer:

My understanding is that the principal advantage of larger bit depth is greater dynamic range. What is the principal advantage of higher sampling rates, if it is not better temporal resolution?

Bryon