Audiogon Discussion Forum

@thespeakerdude

... These theories are tested day in and day out ...

... They do what they do accurately, understanding their limitations ...

I agree with these two statements of yours. What I was referring to are approaches where limitations of the classic theory were disregarded. Let me try to explain from a different angle what I meant, using a concrete example.

Imagine an audio signal which is a sinusoid of frequency 12 KHz, with amplitude described as piecewise function of two segments linear on dB scale. First segment goes from 0 to 100 dB SPL during first half cycle of the the sinusoid. Second segment goes from 100 db SPL to below quantization noise during the next four cycles of the sinusoid.

Try to sample it with 16/44.1. Then try to reconstruct the signal from the samples. Then shift capture time of the first sample by 1/8 of the sinusoid period. Repeat the exercise.

What you’ll find is that, first, reconstruction will be pretty rough, and second, that it will be wildly changing with even small shifts of the first sample capture time.

From Fourier Analysis viewpoint, this is an example of a signal with spectrum extending significantly beyond 20 KHz, which makes sampling at 44.1 KHz untenable, and result of reverse transform unpredictable.

Yet from human hearing system standpoint, such a signal is perfectly valid, and will result in physiological reactions inside several inner hair cells. Most likely, if it manages to evoke a sensation of pitch in a particular individual, perceived pitch frequency will be close to the intended 12 KHz.

An analog system doesn’t care about the sampling frequency, and at what precise moment of time the first sample happens to be taken, and would capture this signal fully, with some distortions of course, yet nevertheless it will capture the shape definitively. And it will be reconstructed definitively as well.

Imagine further, that some short time later, another signal comes in, which is exact reversal of the first one.

Depending on the time difference of the signals start, the sampled values of the second signal may range from exact opposites of the first set of sampled values to something seemingly unrelated.

Once again, human hearing system, with its half-wave rectification capability, will react to the second signal in a similar way it reacted to the first. And once again, the analog system, not restrained by sampling and time shift considerations, will capture the second signal fully.

If, on the other hand, we significantly increase the piece-wise linear segments duration: let’s say first segment goes up for 100 cycles, and the second one goes down for 1,000 cycles, then the 16/44.1 sampling with consequent reconstruction will produce much more agreeable result.

So, I gave an example of a signal which is meaningful and definitive both from the hearing systems and analog recording standpoints, yet non-definitive from the digital sampling standpoint.

Also, an example of a signal with the same general shape, yet with different duration of its characteristic segments. Which happens to be both meaningful and definitive from all three standpoints.

Which illustrates the limitations of digital sampling and classic Fourier-analysis-based DSP: they work well enough in most practically encountered cases, yet not always.

In contrast, analog may be worse in most cases in terms of distortions and noise, yet it works consistently in all practically encountered cases, which may be important for recording and reproduction of certain genres of music.

Increasing the sampling rate effectively rescales the problem: certain signal fragments and components which couldn’t be perceptually transparently captured at a lower sampling rate are now captured well enough at the increased sampling rate.

At the limit, sampling at increasing rate becomes perceptually equivalent to analog recording, sans the distortions and noise. At which point does it happen? It depends greatly on the characteristics of music, and on critical listening abilities of the person who tries to enjoy that music.

Correspondingly, the highest frequency that we can hope to encode with similar fidelity as DSD will be 44,100 / 2 / 64 = 344.5 Hz. Say goodbye to the "micro expression of transients and micro transients"!

I am going to highlight this last paragraph. This is 100% false. That is not how DSD works. The single bit in DSD is not equivalent to a single bit change in PCM. No direct comparisons can be made. Hence you conclusion cannot be made and can be assumed false.

Let me clarify. I wrote "encoded" meaning that we could use the remaining still available stream of one-bit values to encode in the same way that DSD does. Of course bits are used differently by PCM and DSD - pulse vs delta etc.

That was to illustrate the point that the amount of information per second remaining available, in the case if we’d decided to use 15 bits for encoding of dynamic range, is indeed equivalent to a very low-fi format.

There are two flaws in your statement of equivalence 11 bits and 0.03% distortion detection. More like 3 flaws. That distortion limit is at full scale.

To understand what I meant, look at the physical bits of the quietest in this context PCM-encoded signal. All the upper bits, which I called "used for encoding dynamic range", will be zero.

It is not that these specific bits of PCM stream would be always used for encoding dynamic range. What counts is the number of bits that we have to keep unused while encoding the quietest segment of music.

Secondly, please take into account that human hearing system is capable of adjusting its sensitivity, and symphony composers tend to use this factor fully.

The symphonies typically have quiet segments, when a neighboring spectator shuffling her purse may be pretty distracting, and they also have short bursts of apotheosis, with SPL falling just short of hearing system pain threshold.

In the context of a quiet segment, the perceived distortion level threshold is scaled down. That’s why I do indeed consider it as if it was a full-scale signal.

There are other factors of course: e.g. the equivalent loudness curve shifts.Yet if we only consider the most stable part of the curve, at mid-frequencies, the rule-of-thumb calculations generally work, plus-minus a bit.

Assume your stereo is set for 100db peaks, which is fairly loud and you have low distortion playback. There is a particular distortion level evident at that volume. In your analysis, you are claiming to be able to hear distortion at the bit level, on sounds that are only 70db. Are you claiming to be able to hear 0.03% distortion on a 70db peak signal.

That would depend on nature of the music fragment, right? And on my hearing ability. In general, I didn’t claim anything of the sort. Only that, as an order of magnitude estimation, an amp with 0.3% THD is usually considered low quality, an amp with 0.003% THD very high quality. The middle on logarithmic scale: 0.03%, was considered in enough accounts I found credible as a threshold of quality.

Further, CD is dithered. Dither improves the dynamic range where our hearing is most sensitive for added noise where it is not. That extends the dynamic range to where we are most sensitive to 110db. Your argument fails with that information.

Dithering is helpful in most practical cases. Yet, if you look at the mathematical derivations of the common dithering schemes, you’ll see that the characteristic duration of signal stability is a factor in calculations.

Similarly to the examples I gave earlier in this reply. If a signal is composed of slowly changing sinusoids, dithering helps a lot.

It a signal consists mostly of harmonic components quickly changing their amplitudes, non-harmonic transients, and frequently appearing/disappearing components, dithering is not as effective.

>>> So, for faithful reproduction of a symphony we would need

>>> 90 / 6 = 15 bits for encoding the dynamic range, and 14 bits for

>>> encoding the shape of the signal. 15 + 14 = 29 bits.

This is obviously not at all accurate. You are stacking flaws in your understanding of how digital works to come to incorrect conclusions.

I believe at that point I provided enough explanations. Your reactions are quite typical of engineers who consider the classic DSP based on Fourier Analysis the only true paradigm.

From my perspective, it is only absolutely true for abstract mathematical constructs.

It is nothing but useful approximation of real world. One ought to be very careful with the corner cases, where the abstractions stray too far away from the phenomena they are supposed to model.

The digital bit depth only needs to be large enough to encompass the full dynamic range.

As I highlighted, the approach you are advocating doesn’t address the need of having some bits left available for encoding the shape of signal faithfully enough to be perceived as distortions-free.

The theory I use explains well enough why the so-called Loudness Wars can be considered a rational, professionally responsible, reaction to deficiencies of the most widely used at the time audio recording format - CD.

This theory explains why some listeners still prefer listening to LP for some genres of music, despite the fact that, according to the classic theory, CD is vastly superior. Once again, this is a rational and responsible reaction.

The theory explains with good enough for me personally precision why most professional sound mixing and mastering studios didn’t advance beyond the 24/192 PCM format.

It also explains why some modern symphony recording engineers moved to 24/384 and DSD256 formats. And other otherwise unexplainable for me phenomena.

By shifting noise, we don’t even need that many bits for the dynamic range. DSD has 1 bit depth. The noise is shifted to provide large dynamic range. CD has 16 bits. The noise is shifted to increase the dynamic range.

DSD is a delta format. Formally, general DSD has unlimited bit depth, and thus dynamic range. It is only constrained in specific versions of the format to correspond to a set bit depth at an PCM-equivalent sampling rate.

The noise considerations started to amuse me lately. Practical examples were a trio of class-D power amplifies, highly regarded by ASR. I bought them over the years, evaluated, and quickly got rid of, due to intolerable for me distortions.

Yet SINAD of these amplifiers was excellent. Which made me look closely at SINAD measurement procedures. Long story short, SINAD is predicated on taking Fourier transform over a very long window, of a signal comprising of a set of sinusoids with equal and unchanging amplitudes.

Where all three failed miserably for me was reproduction of low-signal-level transients, something SINAD doesn’t capture all that well. Yet the theory I use explained their behavior rather precisely. It also predicted what power amplifiers would be more acceptable to me.

>>> However, in order for the quietest signal to be still

>>> distinguishable, it only needs to be 6 db, or 1 bit, above the noise

>>> floor. This leaves an equivalent of 11 bits for dynamic range,

>>> which is more than twice of the 5 bits of the usable CD dynamic

>>> range.

You are basing this conclusion on a stack of fundamental flaws. It does not represent reality. More accurate is that we can hear below the noise floor.

It depends on the nature of noise and nature of signal, doesn’t it? For white noise and a short sinusoidal burst, I’d agree with you. I’m more interested in a typical music signal, with spectrum close to pink noise, masked by pink noise. In that case, having it 6 dB over the noise floor results in more reliable perception.

>>> Viewed from this perspective, LP has twice as wide usable dynamic

>>> range in comparison with CD. But higher noise and distortions.

This is also based on a stack of flawed assumptions. It is incorrect.

Not on assumptions. On theories. Fitting experimental facts. The theory I use is more sophisticated than the classic one, taking into account analog characteristics of human hearing system.

On its simplest level, instead of considering just dynamic range, it also considers the shape of what the dynamic range is applied to. Once this is done, preference for LP in certain situations ceases to be a mystery.

Cochlea is not a Fourier transforming machine. In some regards it is more crude, yet in others it is far more advanced. As an example, it starts noticeably reacting only after observing two cycles of a pure sinusoid, virtually irrespective of frequency.

For higher frequencies, at 44.1 KHz sampling rate, this may correspond to only a few samples. The shape of a quickly changing signal can’t be faithfully captured by such small number of samples.

Once we get into signals comprised of quickly appearing and disappearing components, the simple intuition good enough for the previous example no longer works, and math becomes much heavier, yet fundamentals remain: the higher the sampling rate (assuming equal quantization accuracy), the deeper the bit depth (assuming equal timing accuracy), the better it gets.

And yes, I’m aware of the oversampling nature of practical ADC and DAC. Of the fact that internally they are sampling/reconstructing signal at significantly higher rates, and then encode adjustments not only into the slower-sampled values within the signal time range, but also outside it.

Still, Information Theory is a bitch. If there isn’t enough bits to encode the changes in the signal that would be noticed by cochlea, some meaningful information would be lost. I did some experiments on fragments of music that I recorded and mixed myself. The distortions of 16/44 compared to 24/192, albeit subtle, mostly manifested themselves as uneven rhythm of smaller-volume transients.

Recent Activity

Unanswered

Related to You

Following

Insider Lobby

Start A New Discussion

Has anyone been able to define well or measure differences between vinyl and digital?

More to discover

Audiogon

The world's largest high-end audio community.

Virtual Systems

Let the world see what you've built.

Bluebook

The right price. Every time.

Merch

Rep the community and hobby you love so much.