is it correct to assume that 24/96 audio would be indistinguishable from cd quality when listened to with speakers with a 20khz 3db and rapid hi frequency roll-off?
Or more precisely, that the only benefit comes from the shift from 16 to 24 bit, not the increased sample rate, as they higher freq content is filtered out anyhow?
related to this, which advice would you have for sub $5k speakerset with good higher freq capabilities for 24/96 audio?
"In the past I've heard 1/2" 15 or 30ips tape with Dolby sound spectacular in a studio, but how many of us have access to such a thing?"
I'm referring to modern large format R2R reference recordings I have heard.
Yes, few have or would want access, but I guess my point is what I have heard here is the reference standard as best I can tell, so CD if adequate or more should be able to match it and I cannot say that it does based on what I have heard (though limited).
"Here's a better question: is there any significant amount of content coming out of the recording industry these days that requires a medium superior to a 16/44 CD?"
That is a good question. Again, for true hardcore audiophiles only, I think there may be some but not much, but I am not certain. Quality large scale orchestrated works with massed strings are the type I question most.
For 98% of the population (maybe more) I think redbook CD covers all teh needed bases adequately at least. That's pretty good!
"Sounds like the consensus is that the original CD redbook format engineers did a more than adequate job, at least in theory."
Well, I would have said the red book CD is "just adequate". I think reducing the word length or the sampling rate might have audible effects, so "just adequate" comes to mind.
As for comparisons to "other media", are you talking about analog? For me at least, vinyl doesn't come close, no matter what you spend. Analog tape can sound very, very good, but except for our own master tapes, where would one get source material? In the past I've heard 1/2" 15 or 30ips tape with Dolby sound spectacular in a studio, but how many of us have access to such a thing?
Here's a better question: is there any significant amount of content coming out of the recording industry these days that requires a medium superior to a 16/44 CD? Perhaps some rare examples in the classical venue, but it seems like none in jazz, rock, pop, country, new age, or whatever. Or am I wrong?
Sounds like the consensus is that the original CD redbook format engineers did a more than adequate job, at least in theory.
So does that mean that when we hear deficiencies in specific redbook CDs compared to other formats (say R2R or very good vinyl even) that it is because of poor execution somewhere in the implementation , either in the recording or playback process or equipment, or most likely even both?
I like to think so but I have not heard the near perfectly created CD on the near perfectly executed system in a viable test scenario compared to other high quality formats yet that would confirm this, so I am not so sure reality reflects the theory in practice quite yet?
Has anybody else heard something specifically that has them convinced?
On my system, I think the issue is a wash, but I have done some imperfect a/b comparisons on very high end dealer reference systems where it was not, especially in comparison to R2R and with better large scale orchestral recordings involving massed strings in particular.
What is the principal advantage of higher sampling rates, if it is not better temporal resolution?
Yes, as Shadorne indicated the principal advantage is that it dramatically relaxes the rolloff requirements for anti-aliasing filters (in the recording process) and reconstruction filters (in the playback process). Or it makes it possible to avoid the use of techniques that have been used to relax those requirements, which have their own tradeoffs (e.g., oversampling + noise shaping).
It should be kept in mind that not only will 44.1kHz sampling be unable to capture signal frequencies at or above 22.05kHz, but the a/d converter used in the recording process must not be exposed to those frequency components. Otherwise "aliasing" will occur, resulting in those ultrasonic frequencies appearing in the digital data as audible frequencies.
Therefore an a/d converter that doesn't use oversampling or other special techniques must be preceded by a low pass filter that is flat to 20kHz, but has rolled off to the point of inaudibility in about 1/10th of an octave, at 22.05kHz. That is an EXTREMELY sharp rolloff, and, besides being expensive to manufacture, that kind of filter can have the sonic effects Kijanki described above in his post of 6/27, and the effect described in my second post of 6/30.
In contrast, 96kHz sampling would make it possible to allow more than a full octave for the same rolloff to occur (at 48kHz rather than 22.05kHz).
Similar considerations apply to the playback process, with respect to the "reconstruction filter," which refers to a low pass filter used to eliminate the stepped character of the d/a converter device's output.
What is the principal advantage of higher sampling rates, if it is not better temporal resolution?
None above redbook CD except it allows cheaper and better filtering which may improve very slightly the audible band. However, higher sample rates do allow you to go to one bit resolution (like SACD format which is a DSD stream but SACD has very high levels of out of band noise - so to be honest I am not sure I accept that it is even as good as 24 bit/96)
07-05-11: Almarg ...we are not hearing the nanoseconds or picoseconds of timing error itself. What we are hearing are the spectral components corresponding to the FLUCTUATION in timing among different clock periods...
That's what I suspected, Al, but I wasn't sure.
And thanks for your explanation of jitter. I was aware that jitter resulted in frequency modulation, but I didn't know that it was a kind of intermodulation distortion. Your explanation is much appreciated.
Shadorne - You may be right that Kunchur's methodology is flawed. I've read a few other experiments on human temporal resolution with similar methodologies, but my memory of them is a little vague. In any case, I have a question about your observation that "Some sample rates are noted for being better than others for reducing audible jitter." I'd be interested to hear a technical explanation for why that is the case.
Finally, I have a general question about high resolution audio that anyone might be able to answer:
My understanding is that the principal advantage of larger bit depth is greater dynamic range. What is the principal advantage of higher sampling rates, if it is not better temporal resolution?
I appreciate your questions. You are definitely curious enough to look into this and I commend you on your interest.
However, poor Kunchur seems a very confused individual.
His test simply shows how two pure tones can interfere with eachother in a way that becomes audible. However, his conclusions are completely bogus. The listener is NOT hearing temporal time-domain effects of microseconds. The listener is actually hearing changes in the combined resultant waveform which has been altered by offsetting one source to the other (combined - meaning both waves and including all room reflections).
As I explained, this will lead to TOTAL destructive interference of the primary direct signal as heard by the listener at an offset of 2.5 CM. This is like a signal that is TOTALLY out of phase. The direct sound will be inaudible and all the listener hears is all the sound around the room (reflected sounds). Since we detect the direction of sound from the relative timing of the wave front (or nerve bundle triggers) across each ear then we lose that ability when a signal is out of phase.
Poor Kunchur is conflating things in a bad way - this is bad science.
However, his remarks about speaker alignment and panels are partly valid. It is almost certain that large radiating surfaces can cause the kind of interference at certain frequencies like what he achieved in this experiment. This manifests itself in a speaker response that has many suckouts across the frequency spectrum. In fact the anechoic response of a large panel response will look like a comb with many total suckouts across the frequency range. The result is that some sounds and some frequencies will not be as tightly imaged as with a point source speaker. Since most sounds are made up from many harmonics this effect will not be complete but on the whole it will lead to a larger more diffuse soundstage with some sounds imaging precisely and others more diffuse than when compared to a point source speaker. There is an audio tool called a flanger that is used for electric guitar - it achieves a similar effect but even stronger.
Also Jitter is not audible in the sense you describe. It is audible when non-random jitter over a great many 1000'sa and 100,000's of samples combines in a way that introduces new frequencies. We hear those new frequencies that are created by the non-random modulation of the clock (random jitter is just white noise at very low inaudible levels).
We are totally UNABLE to hear jitter effects on a few samples.
Your question about the audibility of jitter that is on a time scale far shorter than the temporal resolution of our hearing is a good one. The answer is that we are not hearing the nanoseconds or picoseconds of timing error itself. What we are hearing are the spectral components corresponding to the FLUCTUATION in timing among different clock periods (actually, among different clock half-periods, since both the positive-going and negative-going edges of S/PDIF and AES/EBU signals are utilized), and their interaction with the spectral components of the audio.
For example, assume that the worst case jitter for a particular setup amounts to +/- 1 ns. The amount of mistiming for any given clock period will fluctuate within that maximum possible 1 ns of error, with the fluctuations occurring at frequencies that range throughout the audible spectrum (and higher). That is all referred to as the "jitter spectrum," which will consist of very low level broadband noise (corresponding to random fluctuation) plus larger discrete spectral components corresponding to specific contributors to the jitter.
Think of it as timing that varies within that +/- 1 ns or so range of error, but which varies SLOWLY, at audible rates.
All of those constituents of the jitter spectrum will in turn intermodulate with the audio data, resulting in spurious spectral components at frequencies equal to the sums of and the differences between the frequencies of the spectral components of the audio and the jitter.
If you haven't seen it, you'll find a lot of the material in this paper to be of interest (interspersed with some really heavy-going theoretical stuff, which can be skimmed over without missing out on the basic points):
Malcolm Hawksford, btw, is a distinguished British academician who has researched and written extensively on audiophile-related matters.
One interesting point he makes is that the jitter spectrum itself, apart from the intermodulation that will occur between it and the audio, will typically include spectral components that are not only at audible frequencies, but that are highly correlated with the audio! He also addresses at some length the question of how much jitter may be audible.
So to answer your last question first, no, I don't think that the audibility of jitter on a nanosecond or picosecond scale has a relation to the plausibility of Kunchur's claim.
As far as point no. 1 in my previous post is concerned, yes I think that the quote you provided about closely spaced peaks being merged together does seem to provide a logical connection between his experimental results and a rationale for hi rez sample rates. It hadn't occurred to me to look at it that way. So that point would seem to be answered.
Well, Bryon, that was a very interesting article. I'm not sure what to think after reading it... is this yet another investigation into a micro-problem that doesn't really affect music reproduction, or is it a significant factor? I certainly don't know. I can't even venture a guess.
Anyway, Kunchur admits to listening to cassettes. I haven't heard cassettes for many years, but 16/44 CDs must sound like a revelation by comparison. ;-)
Thanks for your thoughtful responses. Everything you guys said makes sense to me, but I do have some additional thoughts...
07-04-11: Almarg 1)He has apparently established that listeners can reliably detect the difference between a single arrival of a specific waveform, and two arrivals of that waveform that are separated by a very small number of microseconds. I have difficulty envisioning a logical connection between that finding, though, and the need for hi rez sample rates. There may very well be one, but I donÂ’t see it.
I believe Kuncher addresses this in this document, in which he says:
For CD, the sampling period is 1/44100 ~ 23 microseconds and the Nyquist frequency fN for this is 22.05 kHz. Frequencies above fN must be removed by anti-alias/low-pass filtering to avoid aliasing. While oversampling and other techniques may be used at one stage or another, the final 44.1 kHz sampled digital data should have no content above fN. If there are two sharp peaks in sound pressure separated by 5 microseconds (which was the threshold upper bound determined in our experiments), they will merge together and the essential feature (the presence of two distinct peaks rather than one blurry blob) is destroyed. There is no ambiguity about this and no number of vertical bits or DSP can fix this. Hence the temporal resolution of the CD is inadequate for delivering the essence of the acoustic signal (2 distinct peaks).
In essence, I understand him to be saying that the temporal resolution of human hearing is around 6μs. But the temporal resolution of the 44.1 sampling rate is around 11μs. Since the temporal resolution of human hearing is better than the temporal resolution of 44.1 recordings, those recordings fail to accurately represent very brief signals that are both audible and musically significant. For example, Kunchur says:
In the time domain, it has been demonstrated that several instruments (xylophone, trumpet, snare drum, and cymbals) have extremely steep onsets such that their full signal levels, exceeding 120 dB SPL, are attained in under 10 μsÂ…
He also suggests that the temporal resolution of 44.1 recordings might be inadequate to fully represent the reverberation of the live event:
A transient sound produces a cascade of reflections whose frequency of incidence upon a listener grows with the square of time; the rate of arrival of these reflections dN/dt ≈ 4πc3t2/V (where V is the room volume) approaches once every 5 μs after one second for a 2500 m3 room [2]. Hence an accuracy of reproduction in the microsecond range is necessary to preserve the original acoustic environmentÂ’s reverberation.
IÂ’m not saying that these claims are true. IÂ’m just trying to give you my understanding of KunchurÂ’s claims about the connection between human temporal resolution and the need for sampling rates higher than 44.1.
07-04-11: Almarg 2)By his logic a large electrostatic or other planar speaker should hardly be able to work in a reasonable manner, much less be able to provide good reproduction of high speed transients, due to the widely differing path lengths from different parts of the panel to the listener’s ears. Yet clean, accurate, subjectively "fast" transient response, as well as overall coherence, are major strengths of electrostatic speakers. The reasons are fairly obvious – very light moving mass, that can start and stop quickly and follow the input waveform accurately; no crossover, or at most a crossover at low frequencies in the case of electrostatic/dynamic hybrids; freedom from cone breakup, resonances, cabinet effects, etc. So it would seem that the multiple arrival time issue he appears to have established as being detectable under certain idealized conditions can’t be said on the basis of his paper to have much if any audible significance in typical listening situations.
I think perhaps Kunchur does his own view a disservice by emphasizing the deleterious time-domain effects of speaker drivers with large surface areas, e.g. electrostatic speakers. It seems to me that those deleterious effects might be offset to a large extent by the very characteristics you mention, viz., light mass, minimalistic crossover, etc.. But your objection does seem to cast doubt on the significance of the very brief time scales that Kunchur contends are audibly significant.
Having said that, the putative facts about jitter bear on this point in a somewhat paradoxical way. According to some authorities, such as Steve Nugent, jitter is audible at a time scale of PICOseconds. For example, Steve writes:
In my own reference system I have made improvements that I know for a fact did not reduce the jitter more than one or two nanoseconds, and yet the improvement was clearly audible. There is a growing set of anecdotal evidence that indicates that some jitter spectra may be audible well below 1 nanosecond.
That passage is from an article in PFO, which I know you are familiar with. I bring it up, not to defend KunchurÂ’s claims, but to raise another question that puzzles me:
If jitter really is audible at the order of PICOseconds, does that increase the plausibility of KunchurÂ’s claim that alterations in a signal at the order of a few MICROseconds are audible?
Again, I donÂ’t quite know how to make sense of all this. IÂ’d be interested to hear your thoughts.
"Irv, keep in mind that it is generally accepted that signal can be perceived at levels that are significantly below the level of random broadband noise that may accompany the signal. 15db or more below, iirc. So amplifier noise floor is not really a "floor" below which everything is insignificant."
Maybe, but it is very difficult to believe this is the case when listening to music or other complex sounds, like movie dialog or foley. I've always been leery of effects 70db or more below the music level, regardless of the component in question.
This shows how good we are at hearing sounds and nothing to do with temporal resolution.
The wavelength at 7KHz is 5cm. Therefore in order to get the direct sound completely out of phase at the listener one need only move one speaker back by 2.5 cm (half a wavelength). This will result in the direct sound being Zero and will probably reduce the SPL level to be clearly audible. The fact that only a 2.9 mm movement was audible suggests that reflections may also have played a role here too.
The use of pure signal of a single tone with no (audible) harmonics can often gives surprising results! This is not reflective of musical instruments that have many harmonics so it is hard to draw any conclusion other than a test tone produces an audible result. Anyway my money is that there is enough of an amplitude difference here to make it audible in the case of a pure test tone. A pure test tone will fluctuate as you move around the room (you get peaks and suckouts depending on how it all adds up (reflection and direct sound).
Interesting question, and an interesting paper, which I read through. It strikes me as very intelligently and knowledgeably written, and I see no obvious flaws in the details he presents. And intuitively it does strike me as plausible that our ability to resolve timing-related parameters might be somewhat better than what would be suggested by the bandwidth limitations of our hearing mechanisms.
However, looking at his paper from a broader perspective I have several problems with it:
1)He has apparently established that listeners can reliably detect the difference between a single arrival of a specific waveform, and two arrivals of that waveform that are separated by a very small number of microseconds. I have difficulty envisioning a logical connection between that finding, though, and the need for hi rez sample rates. There may very well be one, but I donÂ’t see it.
2)By his logic a large electrostatic or other planar speaker should hardly be able to work in a reasonable manner, much less be able to provide good reproduction of high speed transients, due to the widely differing path lengths from different parts of the panel to the listener’s ears. Yet clean, accurate, subjectively "fast" transient response, as well as overall coherence, are major strengths of electrostatic speakers. The reasons are fairly obvious – very light moving mass, that can start and stop quickly and follow the input waveform accurately; no crossover, or at most a crossover at low frequencies in the case of electrostatic/dynamic hybrids; freedom from cone breakup, resonances, cabinet effects, etc. So it would seem that the multiple arrival time issue he appears to have established as being detectable under certain idealized conditions can’t be said on the basis of his paper to have much if any audible significance in typical listening situations.
3)More generally, it seems to me that there are so many theoretical, practical, recording-dependent, and equipment-dependent variables that would have to be reckoned with and controlled in any attempt to make a meaningful comparison involving hi rez vs. redbook sample rates, that reaching a definitive conclusion about the degree to which this particular factor may be audibly significant under real-world listening conditions is probably not possible.
There is no solid evidence for this - so it is indeed controversial. If a mere few microseconds were important then speaker and listener position would be dependent down to a millimeter or less than a tenth of an inch. It is generally accepted that 1 msec is the point at which time differences become audible (roughly 1 foot). Our ears are roughly 6 to 8 inches apart. Since temporal differences are detected by the difference in arrival at each ear - this all suggests that our "resolution" is close to that length which is about 0.5 msec in time ( at the speed of sound in air).
What these findings may be related to is "jitter" - it has been shown mathematically that non random time errors can produce audible "sidebands" around musical signals and that jitter of 1 microsecond can be quite audible due to our ability to hear these non-musical sounds or tones or sidebands. If you increase the sample rate then you will change the way jitter affects the sound - a significantly higher sample rate would likely reduce the deleterious effects of jitter. Some sample rates are noted for being better than others for reducing audible jitter. Benchmark found that 110 Khz worked better than other rates with the DAC chip they use.
This has been a very interesting thread, and I've learned a lot. I have a question that bears on the value of high resolution audio formats, particularly the value of sampling rates higher than 44.1. Here is the question:
Is the preference for high resolution audio formats (24/96, 24/192, etc.) partly attributable to the fact that those formats have better temporal resolution?
I don't know the answer to this question, but it's been on my mind since reading a number of papers with passages like this:
It has also been noted that listeners prefer higher sampling rates (e.g., 96 kHz) than the 44.1 kHz of the digital compact disk, even though the 22 kHz Nyquist frequency of the latter already exceeds the nominal single-tone high-frequency hearing limit fmax∼18 kHz. These qualitative and anecdotal observations point to the possibility that human hearing may be sensitive to temporal errors, τ, that are shorter than the reciprocal of the limiting angular frequency [2πfmax]−1 ≈ 9 μs, thus necessitating bandwidths in audio equipment that are much higher than fmax in order to preserve fidelity.
That quote is from a paper by Milind Kunchur, a researcher on auditory temporal resolution. More can be read in this article from HIFI Critic. Kunchur's research is somewhat controversial, but I have found a number of other peer reviewed papers that seem to confirm that the limits of human temporal resolution is quite low, on the order of MICROseconds.
If that is true, then part of the advantage of high resolution audio formats might be the fact that they have superior temporal resolution, thereby providing more information about very short alterations in the music, i.e., transients. Or so the argument goes.
That said, I think we are all in agreement that the main usefulness of 24 bits is in the creation of the recording.
Agreed.
However, I would add that high resolution recordings are targeted at audiophiles - so this new high resolution media is useful in that you tend to get a better quality recording that has NOT been heavily compressed for mass consumption. So they ARE useful to audiophiles but not so much from the "improved resolution" but mostly because audio that gets formatted this way tends to be a better quality master rather than a master intended for restaurant, pub, iPod & car FM radio play.
Irv, keep in mind that it is generally accepted that signal can be perceived at levels that are significantly below the level of random broadband noise that may accompany the signal. 15db or more below, iirc. So amplifier noise floor is not really a "floor" below which everything is insignificant.
Also, quantization noise is significantly correlated with the signal, at low signal levels, and is therefore perceived as distortion rather than noise. Dithering will minimize that effect, but it has its limitations and my understanding is that it is often not properly applied.
That said, I think we are all in agreement that the main usefulness of 24 bits is in the creation of the recording.
"Kijanki, are you implying that 24 bit data words have a "finer grain" than 16 bit data words? That each bit represents a smaller incremental signal level? "
"That's the basic reason to us more bits in each sample in digital signal processing of any kind, isn't it?"
That isn't my understanding of how it works, Mapman. Each bit of word length corresponds to 6.02db of dynamic range, or, technically, s/n ratio. So, with 16 bit words you get about 96db of dynamic range, and with 24 bit words 144db. And, of course, 65,535 voltage levels with 16 bits, and 16,777,216 with 24bit words. So 24 bit is "finer grain"?
Yes and no. Yes, there are 16M levels, but there's so much more voltage range to cover. If you reference the maximum level to, say, the 2v max line level used in consumer audio, and that's 24 1s in a row, then all zeros will be 144db below 2v, which would take cryogenic circuits or whatever to achieve. That means you're wasting the bits below the resolution of modern amplification systems, which is probably something like 100db below 2v of power amplifier output, including all amplification stages (I'm being very generous), which means you're wasting 44db of dynamic range, or about 7 bits of word length. So that means you're probably using about 17 out of 24 bits in a real system. And, of course, I'm probably generous by several db of s/n ratio in a real system, so 16 bits isn't far off of what is the resolution limit in a home system, and "finer grain", meaning a better s/n ratio, won't be audible by most mortals.
What 24 bit words are good for is eliminating digital clipping in the recording studio. There's no such thing as a 144db peak in music. :-)
So unless I'm completely misguided (always possible) 24 bit audio isn't really "finer grain" in the 96db of dynamic range that 16 bit audio can encode. 24 bit just goes higher or lower, or a bit of both.
Glad to hear we can all agree. Sony and Philips engineers did a great job with redbook CD, it would indeed be hard to go against all their research.
I agree that transients close to the Nyquist are going to be the most challenging to reproduce faithfully, however, there is really not much in th eway of sounds that one can call music above 15 KHz anyway.
Shadorne, I am in essential agreement with your last post. As I said earlier:
An infinitely long series of samples is required for the mathematics to work out perfectly. The consequences of that will be most significant for spectral components that are transient and that approach the Nyquist frequency (i.e., half the sample rate).
The extent to which that may be audibly significant on most recordings is probably conjectural. The Wilson Audio cd I referenced, among many others, leads me to believe that in general it is not a major factor as a practical matter.
I particularly second your statement that:
... the graphical representation of waveforms and the "digital staircase" form one of the biggest and most enduring audiophile myths that analog is inherently better than digital.
BTW, the following excerpts from the technical notes accompanying the Wilson Audio cd I referenced (which I indicated provides the best reproduction of solo piano in my experience) may be of general interest:
The recorded perspective of the piano in this recording is close, as though the 9' Hamburg Steinway in being played for you in your living room. Of course the actual recording was not made in a living room! Instead, the great room of Lucasfilm's Skywalker Ranch, with its incredibly low noise floor and fully adjustable acoustics, was used.... A pair of Sennheiser MKH-20 omni microphones were employed ... amplified by two superb pure class-A microphone preamps custom-built for Wilson Audio by John Curl. MIT cable carried the balanced line level signal to Wilson Audio's Ultramaster 30 ips analog recorder. Subsequent digital master tapes were made through the Pygmy A/D converter on a Panasonic SV-3700.
I think redbook CD done perfectly correctly both in recording and playback does fit the bill very well as designed as Shadorne indicated.
THe problem is more often the difference between design and theory and its realization in products, which is imperfect.
In order for hi res digital to make a difference, quality standards for accuracy have to be raised as well from end to end. To do that is relatively expensive still, I believe, though technology advances and should become more practical and affordable to achieve sometime down the road.
Not to say there may not be a practical advantage today for some, but this is very marginal at best, very expensive, and still probably not where I would want it to be in terms of technological maturity for me as a fairly average Joe audio buff to buy in.
I do need to download some hi res files sometime soon though and actually test out the waters a bit (no pun intended).
Most sounds last at least a hundredth of a second or longer. My point is that even for a 15 KHz sound you are likely to be hearing 15000/100 = 150 cycles. It is irrelevant that the amplitude of a few cycles may not be graphically represented perfectly. The problem is the context we are talking about is related to hearing rather than graphical presentation of a waveform.
Although Kijanki is right about the graphical accuracy my point is that,as regards to human hearing and music, this is not so relevant. In essence the engineers at Sony and Philips did a thorough job when they came up with rebook CD! Perhaps if redbook CD was not as good as it is then SACD would not have failed. The problem is that SACD and other higher resolution formats are very much into diminishing returns compared to a well produced CD.
I would add that the graphical representation of waveforms and the "digital staircase" form one of the biggest and most enduring audiophile myths that analog is inherently better than digital. In fact, most of the benefits of analog come from the added distortion that is pleasing to the ear - analog tape machines are wonderful devices for audio compression(removing dynamic range)!
"Kijanki, are you implying that 24bit data words have a "finer grain" than 16bit data words? That each bit represents a smaller incremental signal level? "
That's the basic reason to us more bits in each sample in digital signal processing of any kind, isn't it?
Kijanki, are you implying that 24bit data words have a "finer grain" than 16bit data words? That each bit represents a smaller incremental signal level?
Kijanki & Shadorne, you're both basically right but you're referring to different things.
Shadorne is alluding to the fact that a low pass reconstruction filter will smooth out the steps and restore an essentially perfect sine wave, if the original analog input was a sine wave at a frequency slightly less than the Nyquist rate (or lower). Of course, the filter itself may have significant side effects, but that is another subject.
Kijanki was alluding to the fact that if the analog input is a brief transient lasting for a limited number of samples and having spectral components approaching the Nyquist frequency, then the mathematics won't work out ideally no matter how ideal the reconstruction process is. Which is correct, although as I said earlier whether or not that may be audibly significant with worst case material (e.g., high frequency percussion) is probably a matter of conjecture. Admittedly, the video does not directly relate to Kijanki's point.
As far as the relation between low sample rates and quantization noise is concerned, while lower sample rates would obviously result in coarser steps in the sampled (unreconstructed) waveform, I think that Irv is basically correct to the extent that the reconstruction process can be accomplished ideally. However, given the possible effects on high frequency transients that we've been discussing, that may result from having a limited number of samples, and given the non-idealities of real-world filters, I suppose there could be some second-order relation between sample rate and quantization noise. It's been a long time since I took the relevant courses. :-)
Shadorne - You have no clue. Filter will smooth-out the steps but will never remove few Hz modulation that was shown at 21Hz sampling rate. Al, please help me here or I'm going to kill myself.
Irvrobinson - I'm not talking about frequency range of our hearing but rather resolution of our hearing similar to number of shades of gray you can distinguish taking into consideration adverse conditions like ambient noise, system noise, THD, IMD etc.
I don't care anymore to defend myself from such attacks. You guys have no basic education in electronics and post nonsense just to keep arguing. Signing off.
That video is wrong. It is showing a stair step signal which is NOT what the output of a DAC would look like. The output will be smoothed by a filter in order to eliminate all that horrible spurious high frequency signal from the stair steps. The output filter will remove the stair step and restore the sine wave so that the signals are much more alike - even a little above Nyquist - absolutely No need to got 100Hz sampling to properly render a 10Hz sine wave.
warning not everything you see from Universities is accurate.
Also it is WRONG to compare signals in this way. We hear frequencies NOT the waveform as presented graphically! The closeness of the waveforms as presented graphically is NOT a proxy for how close alike they will sound!
"I think that our hearing ability ends up slightly above 16-bit perhaps 18-20bits but I'm more concerned with sampling rate because low sampling rate in addition to phase shifts in steep low pass filters increases quantization noise (or size of square steps to make it simpler)."
It's statements like that, Kijanki, that make me wonder if you know what you're talking about. The width of the data word has nothing to do with our hearing ability. How many bits per word determines how many loudness levels there are. It's the sampling rate in KHz that determines how high the frequency response goes. You know that, right?
Bob - Thanks for the link. I suspect that THD is a dominating factor at higher power. Noise issue itself is non-existent in my opinion because if I cannot hear anything in a silent room at full power (dead silent) I don't worry. Many amps with similar 80dB THD+N performance are showing -120dB noise floor on the other graphs. Also, small amount of noise helps to increase resolution - technique known as dithering widely used in photography. I would be more concerned with THD and it doesn't look good.
I don't know what is relationship between THD and resolution but I suspect that resolution will still bring better sound. Another reason for that is quantization noise that is smaller at higher resolutions. DAC1 does very good job here by using sigma-delta converter that pushes quantization noise to higher bandwidth (oversampling).
I think that our hearing ability ends up slightly above 16-bit perhaps 18-20bits but I'm more concerned with sampling rate because low sampling rate in addition to phase shifts in steep low pass filters increases quantization noise (or size of square steps to make it simpler).
Al, Huge errors applied to the highest harmonics only will result only in small sound change. There will be small difference in sound of cymbals and perhaps in ambiance. I use 16/44 and like it, but try to be educated about it. That's all.
Thanks, Kijanki. The one thing I would question in your comment is the word "huge." I'm sure that a suitably chosen test waveform comprising a very short burst of high frequency energy, and put through a 44.1kHz a/d + d/a, can result in an error that will appear huge when viewed on an appropriate time scale. But as the saying goes the proof is in the pudding, and I've felt amazed at times at how good SOME cd's that contain a lot of transient high frequency energy can sound.
Al, I found video to show what happens when sampling just above Nyquist frequency. It might be possible to fix the output with sinc or other reconstruction functions but only if signal lasts a lot of cycles. If signal is short and disappears reconstruction will have huge error.
Bottom line: I am not losing any sleep over hi rez digital. There are too many ifs to really matter at this point for me and the benefits are marginal compared to the extra cost and overhead associated with even larger data files.
07-01-11: Shadorne Of course, in a studio the signals are manipulated - this creates the need for even greater dynamic range (24 bit or 144 dB) - not that they will necessarily have better S/N but they may want to boost some sounds by 20 dB or so and may apply digital filters (the accuracy of said filters improves significantly if you have more bits)
Excellent point!
06-29-11: Kijanki ... Nyquist-Shannon theorem requires infinite amount of terms (samples). Fixing it with sin(x)/x works poorly for short bursts around 1/2 of the sampling frequency. Sound of instruments producing continuous sound might be not affected (like flute) but anything with transients will sound wrong (piano, percussion instr. etc).
06-30-11: Kijanki Closer you get to Nyquist frequency the more samples you need to properly reconstruct original waveform - not possible to do for short high frequency sounds.
07-01-11: Shadorne Not so. The waveform is perfectly reconstructed. The mathematics are quite rigorous. The main issue with digital is
1. Anti alias filtering (higher frequencies must be eliminated prior to ADC or they can fold in) 2. Jitter
Both of the above add spurious non musical signals. Both can be managed.
In theory Kijanki is correct. An infinitely long series of samples is required for the mathematics to work out perfectly. The consequences of that will be most significant for spectral components that are transient and that approach the Nyquist frequency (i.e., half the sample rate).
The extent to which that may be audibly significant on most recordings is probably conjectural. The Wilson Audio cd I referenced, among many others, leads me to believe that in general it is not a major factor as a practical matter.
Shadorne is of course correct, IMO, in emphasizing the significance of anti-alias filtering and jitter.
On the S/N discussion, this is usually around 100 dB on good gear. I am certain this is achievable because my speakers can hit about 112 dB SPL at the listening position (12 feet back) as measured with a SPL meter whilst I cannot hear any sound (when no music is playing) from the tweeter unless my ear is within about 6 inches. This translates to roughly 100dB(taking into account the difference in distance which is around 12 dB and assuming the threshold for hearing hiss is around 20 dB in the room with inherent ambient noise around)
I think the ambient room noise and the speakers peak clean SPL are the limiting factors in a typical setup.
I think tape hiss or vinyl noise is limiting you to about 60 or 70 dB dynamic range on analog recordings.
I think high quality digital recordings can probably achieve around 90 dB dynamic range - limitations being the ambient noise during the recording process.
This is why CD is more than good enough for playback. This is why there are a few rebook CD recordings that are world class.
Of course, in a studio the signals are manipulated - this creates the need for even greater dynamic range (24 bit or 144 dB) - not that they will necessarily have better S/N but they may want to boost some sounds by 20 dB or so and may apply digital filters (the accuracy of said filters improves significantly if you have more bits)
. Closer you get to Nyquist frequency the more samples you need to properly reconstruct original waveform - not possible to do for short high frequency sounds.
Not so. The waveform is perfectly reconstructed. The mathematics are quite rigorous. The main issue with digital is
1. Anti alias filtering (higher frequencies must be eliminated prior to ADC or they can fold in) 2. Jitter
Both of the above add spurious non musical signals. Both can be managed
Irvrobinson - I assume that you buy properly sized amp for the speakers and the room. My amp is rated 150W at 6ohm and I am pretty sure I am getting peaks even larger than that (headroom). It corresponds to largest digital number coming from CD - meaning covers full dynamic range. If you listen at 1W then I agree that you have no chance to experience full dynamic range, not only because of the noise floor of the amp but more likely because of the ambient noise and threshold of our hearing.
To test if power amp is limiting factor is very simple - Just turn on power amp, set volume to zero and listen. Can you hear anything? I cannot - dead silent. If I cannot hear anything in very quiet room in my listening position why even bring numbers into discussion?
As for Nyquist - digital reproduction is decent from 16/44 media and, according to reviews, pretty good with SACD. I seriously doubt that they would release 24/192 master tapes to public. What is released right know as high resolution is often the same as 16/44 (I read article about it). SACD is a different story because it cannot be copied (pit width modulation) but it does not work with the server and selection is very limited. I settled at 16/44 for all the reasons I mentioned before but understand its limitations. I adjusted my gear accordingly with very forgiving Hyperion speakers.
I dunno, Kijanki, I randomly looked at two good power amps in JA's testing, a Moon and a Pass, and they were both had measured s/n ratios of about -84db at 1 watt, which is actually excellent performance. You keep forgetting that most power amps have about 30db of gain. Measuring s/n ratio at full power is sort of cheating for marketing's sake. A speaker with 95db/2.83v/1m efficiency, like my old Legacy Focus, will let you hear the hiss from such an amp rather readily. So I still contend that for listening the amps are the limiting factor, not well implemented 16/44.
As for your comments about Nyquist, it would seem your real thesis is that digital reproduction isn't very good, even with a DAC1. I still wonder, why does it sound so good if you're correct? I'm missing something.
Al, Thank you. You brought very important point - quality of the recording engineering (on the top of compression issue that Shadorne mentioned). Average quality is not very high while some of the recordings I have are just incredibly good. Perhaps I'm arguing too much for the best case scenario while average quality of the recording was another reason for me to stay with 16/44 and Benchmark DAC1.
BTW, I should have added to my previous post that I am in agreement with all of the technical points Kijanki has made, which I think have been very well presented. An additional point which I don't think has been mentioned is that brickwall anti-aliasing filters introduce some degree of ripple into the passband frequency response characteristics, as I understand it.
The audible significance of all of the effects that have been mentioned, though, is perhaps unanswerable in a definitive manner, given the extent to which those effects tend to be overshadowed by variability in recording engineering and quality.
What can I say - I posted example showing that quality amplifier is not a limiting factor. Why not to respond to that? If you think you can find any mistake in my reasoning please say so. Even for my own Rowland 102 (a class D amp) dynamic range is stated as 110dB while Rowland 301 is rated 120dB. Every Krell is at least 106dB unweighted (Evolution 900e is 113dB unweighted related to full power). You can search for a bad amp but the point was to show that the amp is not the limiting factor.
As for the Benchmark DAC1 again - If you cannot hear the difference then you can not, but please don't bring Nyquist into discussion since his theorem was intended toward stationary waveforms (infinite number of samples). Closer you get to Nyquist frequency the more samples you need to properly reconstruct original waveform - not possible to do for short high frequency sounds. That is the main reason so many people still stay with vinyl (unless you think they like convenience).
Irvrobinson 6-30-11: I still haven't heard a piano recording superior to the ancient Telarc CD of Malcom Frager playing Chopin.
If you can find it, try Wilson Audio WCD-9129, Chopin's Piano Sonata No. 3 in B Minor, Op. 58 (and other shorter works), performed by Hyperion Knight. The best reproduction of solo piano in my experience, and it's on a 1991 redbook cd!
First of all, I totally agree about compression running rampant these days, especially for drum kits. My wife is a drummer, so I know what drums really sound like, and only a few recordings give you a hint of their dynamic range. In fact, she and I both lament that a lot of modern recordings don't even use real drums any more, only those electronic travesties, for ease of recording.
To expand about about differences I've heard with hi-res on the DAC1, sometimes I think I hear a difference, in that some hi-res recordings seem to reveal something I've never heard before, but then I go back to a CD and hear similar things. Or I find an awesome recording on CD that sounds better than anything I've heard on hi-res. For example, oddly, I still haven't heard a piano recording superior to the ancient Telarc CD of Malcom Frager playing Chopin. That old Soundstream recording even forces an odd conversion to 16/44, and it still sounds great.
As for Kijanki's comment that the s/n ratio of most amps is specified at 1W, I say check again. All of the obvious ones I've checked reference full power, and most can't break -85db at 1W into 8ohms. JA's measurements in Stereophile are very interesting in this regard. (His measurements are the only reason I read the magazine.)
I think redbook CD format specifies the dynamic range for the 16 bit format and is fairl y standardized as a result.
Not sure this is the case with other newer hi rez digital formats?
More bits enables more dynamic range and more detail together. How this happens might be highly variable in lieu of a standard.
In any case, for hi rez digital sources, I suspect a difference associated mainly with the high frequencies can be heard if done right, but that may be a big if at this juncture still.
To hear the most possible, you definitely want very good, younger ears, speakers that can handle dynamics and transients well and also have very good detail assuming the production is done well and the DAC not only reads the format but is able to output analog of similar resolution and quality.
At this still emerging stage of hi rez digital audio, I doubt it is a safe bet that hi rez source material and playback gear meets these requirements well in general, although I am sure there is some reference type recordings and better gear that do.
The first place I would listen for the difference is in well recorded massed bowed strings in orchestral music. Use a good modern RTR reference recording as a reference standard. Even older trained ears should hear a noticeable difference if the digital is not extremely well done.
I have had the opportunity to listen to RTR, vinyl, and good redbook CD recordings on a very well done dealer system using mbl 111e speakers. The difference from RTR to redbook CD was pronounced but you might not notice the limitations of the redbook CD unless compared to the RTR or even good vinyl on a similarly good system.
The SOTA wide and deep soundstage in this optimized and very resolving mbl setup provided exactly the venue size and 3 dimensonal sound quality needed to be able to hear these kinds of differences clearly. Quite an eye (ear?) opener!
Shadorne, agree about compression but it is simply who is driving the market. Release uncompressed piano recording (about 96dB dynamics) and a lot of people will complain that on their boom boxes or shelf speakers woofers are constantly buzzing. Hi-res has different clientele so they reduced compression a bit but it is still bad. Also, as you mentioned, they try to make average loudness as high as possible because to inexperienced customer it appears as higher quality recording especially with poorly resolving systems.
You must have a verified phone number and physical address in order to post in the Audiogon Forums. Please return to Audiogon.com and complete this step. If you have any questions please contact Support.