Why Do So Many Audiophiles Reject Blind Testing Of Audio Components?


Because it was scientifically proven to be useless more than 60 years ago.

A speech scientist by the name of Irwin Pollack have conducted an experiment in the early 1950s. In a blind ABX listening test, he asked people to distinguish minimal pairs of consonants (like “r” and “l”, or “t” and “p”).

He found out that listeners had no problem telling these consonants apart when they were played back immediately one after the other. But as he increased the pause between the playbacks, the listener’s ability to distinguish between them diminished. Once the time separating the sounds exceeded 10-15 milliseconds (approximately 1/100th of a second), people had a really hard time telling obviously different sounds apart. Their answers became statistically no better than a random guess.

If you are interested in the science of these things, here’s a nice summary:

Categorical and noncategorical modes of speech perception along the voicing continuum

Since then, the experiment was repeated many times (last major update in 2000, Reliability of a dichotic consonant-vowel pairs task using an ABX procedure.)

So reliably recognizing the difference between similar sounds in an ABX environment is impossible. 15ms playback gap, and the listener’s guess becomes no better than random. This happens because humans don't have any meaningful waveform memory. We cannot exactly recall the sound itself, and rely on various mental models for comparison. It takes time and effort to develop these models, thus making us really bad at playing "spot the sonic difference right now and here" game.

Also, please note that the experimenters were using the sounds of speech. Human ears have significantly better resolution and discrimination in the speech spectrum. If a comparison method is not working well with speech, it would not work at all with music.

So the “double blind testing” crowd is worshiping an ABX protocol that was scientifically proven more than 60 years ago to be completely unsuitable for telling similar sounds apart. And they insist all the other methods are “unscientific.”

The irony seems to be lost on them.

Why do so many audiophiles reject blind testing of audio components? - Quora
128x128artemus_5
Post removed 
"A-B testing, blinded or no, performed by an experienced reviewer with discerning ears provides a possibly useful data point guiding the consumer on their quest, nothing more."

I would add  "...performed by an honest and experienced reviewer...", meaning "with no conflict of interest of any sort".

"...as only 1 person is making a claim."

Not around here. You may need to hire a scheduler. The line is longer than in front of the Nike store on the day of a sneaker release.
I'll jump in and add my point of view. I'm a cognitive psychologist and, also, appreciate good audio. As a psychologist I have worked most of my professional life quantifying people's perceptions of products using various psychometric techniques. I correlate perceptions with physical features of products to inform designers, engineers, and marketing specialists what product features are most closely aligned with the desired experience for their intended market. Most of my research is "blind" in that my subjects do not know the origin of any particular "stimulus" (product) that they experience and this is done to reduce bias from unintended influences such as brand identity or product features that are not relevant to the research.

There are some important additional points I wanted to make for audio. Sensory scientists have found it necessary to distinguish between two types of perception. First is the veridical description of a sensory stimulus, i.e., it's visual, auditory, tactile, gustatory, or olfactory qualities. This strikes me as being similar to what most audiophiles strive for. It is noteworthy that such a so called descriptive analysis is left in the sensory sciences to trained experts. The image of a wine connoisseur may come to mind. The reason is that most "average" people lack both the sensitivity to detect subtle physical properties of products as well as the vocabulary to reliability describe them.

Most companies take their typical consumers into account with a second type of perceptual measurement, which describes the subjective perceptual experience (i.e., feelings and emotions) of their customers. In this case, a sample of people is required because results are based upon statistical estimates of a sample of perceptual judgements. But, consumers are asked very different questions than experts such as, "do you like it?", or "is it pleasing?". In my own research I rely on the psychological theory of Semantic Differentials, which describes three underlying psychological dimensions of experience: a) Valence (like/don't like), b) Strength (strong/delicate), and Arousal (stimulating/relaxing). In addition, I find a fourth Semantic dimension of Novelty (familiar/unfamiliar) is required to describe the full perceptual experience of actual consumers. These psychological dimensions are common to all humans (accounting for differences in language) and are bipolar in nature. That is, for each dimension experience ranges between two polar opposite extremes with a "neutral" point in the middle.

Importantly, only Valence has an obviously preferred polarity, i.e., "don't like" is always a bad thing. The other three Semantic dimensions may range anywhere between the bipolar extremes depending upon one's design goals. So, how do you know what is the best product? A similar analysis of a comparison stimulus may serve that purpose and that seems similar to what is often described in the audio community. But, an imagined "ideal" experience may also be used. I have had subjects in my research imagine their "ideal" product and rate it prior to experiencing the actual products and that provides a target experience profile for actual products. Differences between target and actual products may then be statistically compared within each Semantic dimension.

An important consequence of the multidimensional approach is that two or more products may be both similar to one another with respect to one Semantic dimension and different from one another with respect to other Semantic dimensions. This might explain the seemingly never-ending debate about whether audio systems are different or not. The answer may be that they are both similar and different depending upon which Semantic dimension of experience you attend to. In my research, I always report the entire profile of Semantic scores for each product so that similarities and differences may be directly compared. 

One last point (finally!) is that different physical properties of products correlate with scores on each of the four Semantic dimensions. This provides actionable information to fine tune a product to a particular desired level of Semantic experience. I have done a good bit of this sort of thing in my career including with acoustics. Specific acoustic requirements for a particular Semantic profile can be obtained by correlating various acoustic metrics (or expert judgements) with the Semantic scores for a sample of consumers. I have used similar methods to define requirements for visual and tactile qualities of products as well.
Post removed 

artemus_5
 
Science is the art of reducing the field of the unknown by making objective observations and measuring them.
Religion is the art of camouflaging the field of the unknown with dogmatic certainties that are usually not correlated by objective observation.