Reviews with all double blind testing?


In the July, 2005 issue of Stereophile, John Atkinson discusses his debate with Arnold Krueger, who Atkinson suggest fundamentally wants only double blind testing of all products in the name of science. Atkinson goes on to discuss his early advocacy of such methodology and his realization that the conclusion that all amps sound the same, as the result of such testing, proved incorrect in the long run. Atkinson’s double blind test involved listening to three amps, so it apparently was not the typical different or the same comparison advocated by those advocating blind testing.

I have been party to three blind testings and several “shootouts,” which were not blind tests and thus resulted in each component having advocates as everyone knew which was playing. None of these ever resulted in a consensus. Two of the three db tests were same or different comparisons. Neither of these resulted in a conclusion that people could consistently hear a difference. One was a comparison of about six preamps. Here there was a substantial consensus that the Bozak preamp surpassed more expensive preamps with many designers of those preamps involved in the listening. In both cases there were individuals that were at odds with the overall conclusion, and in no case were those involved a random sample. In all cases there were no more than 25 people involved.

I have never heard of an instance where “same versus different” methodology ever concluded that there was a difference, but apparently comparisons of multiple amps and preamps, etc. can result in one being generally preferred. I suspect, however, that those advocating db, mean only “same versus different” methodology. Do the advocates of db really expect that the outcome will always be that people can hear no difference? If so, is it the conclusion that underlies their advocacy rather than the supposedly scientific basis for db? Some advocates claim that were there a db test that found people capable of hearing a difference that they would no longer be critical, but is this sincere?

Atkinson puts it in terms of the double blind test advocates want to be right rather than happy, while their opponents would rather be happy than right.

Tests of statistical significance also get involved here as some people can hear a difference, but if they are insufficient in number to achieve statistical significance, then proponents say we must accept the null hypothesis that there is no audible difference. This is all invalid as the samples are never random samples and seldom, if ever, of a substantial size. Since the tests only apply to random samples and statistical significance is greatly enhanced with large samples, nothing in the typical db test works to yield the result that people can hear a difference. This would suggest that the conclusion and not the methodology or a commitment to “science” is the real purpose.

Without db testing, the advocates suggest those who hear a difference are deluding themselves, the placebo effect. But were we to use db but other than the same/different technique and people consistently choose the same component, would we not conclude that they are not delusional? This would test another hypothesis that some can hear better.

I am probably like most subjectivists, as I really do not care what the outcomes of db testing might be. I buy components that I can afford and that satisfy my ears as realistic. Certainly some products satisfy the ears of more people, and sometimes these are not the positively reviewed or heavily advertised products. Again it strikes me, at least, that this should not happen in the world that the objectivists see. They see the world as full of greedy charlatans who use advertising to sell expensive items which are no better than much cheaper ones.

Since my occupation is as a professor and scientist, some among the advocates of double blind might question my commitment to science. My experience with same/different double blind experiments suggest to me a flawed methodology. A double blind multiple component design, especially with a hypothesis that some people are better able to hear a difference, would be more pleasing to me, but even here, I do not think anyone would buy on the basis of such experiments.

To use Atkinson’s phrase, I am generally happy and don’t care if the objectivists think I am right. I suspect they have to have all of us say they are right before they can be happy. Well tough luck, guys. I cannot imagine anything more boring than consistent findings of no difference among wires and components, when I know that to be untrue. Oh, and I have ordered additional Intelligent Chips. My, I am a delusional fool!
tbg
Mankind, believing the bible, ignore massive bones that kept being discovered. Jefferson charged Lewis and Clark to find if such large creatures lived on the Missouri River. Yes, we are all victims of our underlying theories. Darwin explained evolution and we retheorized where such bones might have come from.

What does this have to do with DBTesting? Nothing.
To the doubters of DBT:

Women are fairly recent additions to professional orchestras. For years and years, professional musicians insisted they could hear the difference between male and female performers, and that males sounded better. Women were banished to the audience. The practice ended only after blind listening tests showed that no one could discern the sex of a performer.

Surely, these studies had as many flaws as blind cable comparisons. Probably more, since they involved live performances by individual people, which are inevitably idiosyncratic.

Would the DBT doubters here have been lobbying to keep women out of orchestras even after the tests? Or would they, unlike the professional musicians of the day, never have heard the difference in the first place?
One thing about being over 60 is that the style of thought in society has changed but not yours. When I was a low paid assistant professor and wanted ARC equipment for my audio system, I just had to tell myself that I could not afford it, not that it was just hype and fancy face plates or bells and whistles and that everyone knows there is no difference among amps, preamps, etc. DBT plays a role here. Since it finds people can hear no differences and has the label of "science," it confirms the no difference hopes of those unable to afford what they want. My generation's attitudes no result in criticizing other peoples buying decisions as "delusional."

I certainly have bought expensive equipment whose sound I hated (Krell) and sold immediately and others (Cello) that I really liked. I have also bought inexpensive equipment that despite the "good buy" conclusion in reviews proved nothing special in my opinion (Radio Shack personal cd player). There is a very low correlation between cost and performance, but there are few inexpensive components that stand out (47 Labs) as good buys. This is not to deny that there are marginal returns for the money you spend, but the logic of being conscious of getting your money's worth really leads only to the cheapest electronics probably from Radio Shack as each additional dollar spent above these costs gives you only limited improvement.

DBTesting, in my opinion, is not the meaning of science, it is a method that can be used in testing hypotheses. In drug testing, since the intrusion entails giving a drug,, the control group would notice that they are getting no intrusion and thus could not be benefited. Thus we have the phony pill, the placebo. The science is the controlled random assignment pretest/posttest control design and the hypothesis, based on earlier research and observations of data, that it is designed to answered with the testing.

If we set aside the question of whether audio testing should be dealt with scientifically, probably most people would say that not knowing who made the equipment you hear would exclude your prior expectations about how quality manufacturers equipment might sound. Simple A/B comparisons of two or even three amps with someone responsible for setting levels is not DBT. Listening sessions need to be long enough and with a broad range of music to allow a well based judgment. In my experience, this does remove the inevitable bias of those who own one of the pieces and want to confirm the wisdom of their purchase, but more importantly does result in one amp being fairly broadly confirmed as "best sounding." I would value participation in such comparisons, but I don't know whether I would value reading about such comparisons.

I cannot imagine a money making enterprise publishing such comparisons or a broad readership for them. I also cannot imagine manufacturers willingly participating in these. The model here is basically that of Consumers Reports, but with a much heavier taste component. Consumers Reports continues to survive and I subscribe, but it hardly is the basis of many buying decisions.

My bottom line is that DBT is not the definition of science; same/different comparisons are not the definition of DBT; any methodology that overwhelmingly results in the "no difference" finding despite most hearing a difference between amps clearly is a flawed methodology that is not going to convince people; and finally, that people do weigh information from tests and reviews into their buying decisions, but they also have their personal biases. No mumble-jumble about DBTesting is ever going to remove this bias.
[W]hat components do you think match up well against really really expensive ones?

That is a loaded question. I know a guy who wanted to find the cheapest CD player that sounded identical to the highly touted Rega Planet. He went to a bunch of discount stores, bought up a half dozen models, and conducted DBTs with a few buddies. Sure enough, most of the units he chose were indistinguishable from the then-$700 Planet. The cheapest? Nine dollars.

That is not a misprint.

Lest you think he and his friends were deaf and couldn't hear anything, they really did hear a difference between the Planet and a $10 model. At that level, quality is hit-or-miss. But I should think that any DVD player with an old-line Japanese nameplate could hold its own against whatever TAS is hyping this month. If they sound different, it's probably because the expensive one doesn't have flat frequency response (either because the designer intentionally tweaked it, or because he didn't know what he was doing).

Amps are a bit trickier, because you have to consider the load you want to drive. But the vast majority of speaker models out there today are fairly sensitive, and don't drop much below 4 ohms impedance. A bottom-of-the-line receiver from a Denon or an Onkyo could handle a stereo pair like that with ease. (Multichannel systems are a different story. But I once asked a well-known audio journalist what he would buy with $5000. He suggested a 5.1 Paradigm Reference system and a $300 Pioneer receiver. He was not joking.)

There are good reasons to spend more, of course. Myself, I use a Rotel integrated amp and CD player. I hate all the extra buttons on the A/V stuff, and my wife finds their complexity intimidating. Plus, I appreciate simple elegance. I also appreciate good engineering. If I could afford it, I'd get a Benchmark DAC and a couple of powerful monoblocks. But that money is set aside for a new pair of speakers.
Tgb:

All of us hear are interested in one thing: the truth. If DBT is a fundamentally flawed methodology, its results are no guide to the truth about what sounds good. So if the studies are all flawed, and there are audible differences between amplifiers with virtually the same specs, even if, somehow, no one can detect those differences without looking at the amps, then I'm with you. Likewise, if there isn't anything fundamentally wrong with the studies, and they strongly indicate that certain components are audibly indistinguishable, then you should be with me.

Your own perceptions -- "I can hear a difference and my tastes are all that matters" -- should not trump science any more than your own experiences in general should trump science. I remember seeing ads with athletes saying "Smoking helps me catch my wind." I also recall people saying how smoking made them healthy and live long. Their personal experiences with smoking did not trump the scientific evidence, though. This is just superstition. The Pennsylvania Dutch used to think that if you didn't eat doughnuts on Fastnacht's Day, you'd have a poor crop. Someone had that experience, no doubt. But it was just an accident. Science is supposed to sort accident from true lawful generalization. It's supposed to eliminate bias, as far as possible, in our individual judgments and take us beyond the realm of the anecdote.

Now, if your perception of one component bettering another is blind, then ok. But if you're looking at the amp, then, given what we know about perception, your judgments aren't worth a whole lot.

So... are the studies all flawed? Well, certainly some of the studies are flawed. But, as Pableson said, the studies all point to the same conclusions. And there are lots of studies, all flawed in different ways. Accident? Probably not.

Compare climate science. Lots of models of global temperatures over the next hundred years and they differ by a wide margin from each other (10 degrees). They're all flawed models. But they all agree there's warming. To say that the models are flawed isn't enough to dismiss the science as a whole. Same in psychoacoustics.

Long story short: there's no substitute for wading through all of the studies. I haven't done this, but I've read several, and I didn't see how the minor flaws in methodology could account for no one's being able to distinguish cables, for instance.
Qualia, you state, "So, if two amps cannot be distinguished unless you're looking at the faceplates, why buy the more expensive one? Now who finds fault with that reasoning?" My point is that DBTesting "no difference" is not no difference. It is not a valid methodology as it is at odds with what people hear even if they cannot see the faceplates. Furthermore, I can hear a difference and my tastes are all that matters. This is not scientific demonstration.
leme, I am not at all interested in DBTesting as I know from personal experience that there are substantial differences between both cables and amps. This is why I would have to say there is real concept invalidity to BDTesting. Furthermore, I really don't care what the results would be but suspect that a disproportional percentage of the time DBTests accept the null hypothesis.

Pabelson, I did not mean to say that I put much stake in what a reviewer may say even were I to have agreed with him in the past.

Bigjoe, certainly you can dismiss DBT if you find it invalid. Science has to be persuasive or orthodox. And as I keep saying this is not a hypothesis testing circumstance; it is a personal preference situation. Science is supposed to be value free with personal biases not influencing findings, but taste is free of such limitations or the need to defend them.
One more question for Pabelson:

Since you've obviously read a lot more DBT stuff than I have, I'm interested to know: what's your system? (Or, what components do you think match up well against really really expensive ones?)
Steve: I wouldn't be quite so dogmatic about the lack of differences, for one reason: Many audiophiles don't level-match when they do comparisons. So there really are differences to hear in that case. Of course, a difference you can erase with a simple tweak of the volume knob isn't one worth paying for, in my opinion.
Pableson:

I think we haven't nearly exhausted all of the non-acoustic mechanisms in play, but the ones you mention are certain among them, and probably more relevant than the ones I mentioned. My general point was that the little bit of psychology I have studied makes me awfully wary of the "objectivity," or context-independence of my own perceptual judgments of quality.

It's good to hear you still take a lot of joy in the audio hobby. It remains unknown whether you can take *as much* joy as you would if you weren't such a skeptic!
Several people here seem to mistake the purpose of DBT. The purpose is not necessarily finding the "best" component, although that may be the case, for instance, in Harman's speaker testing. The point is often simply to see if there is any audible difference whatsoever between components. As Pabelson noted way, way back in this thread, if two systems differ with respect to *any* fancy audiophile qualities (presentation, color, soundstage, etc.) then they will be distinguishable. And if they are distinguishable, that will show up in DBT. Ergo, if two systems are NOT distinguishable with DBT, they do not differ with respect to any fancy audiophilic qualities. (That's modus tollens.)

So, if two amps cannot be distinguished unless you're looking at the faceplates, why buy the more expensive one? Now who finds fault with that reasoning?

It's not a matter of "I like one kind of sound, that other guy likes another kind of sound, so to each his own." If no one can distinguish two components, then our particular tastes in sound are irrelevant. There's just no difference to be had.
Tbg: The average consumer cannot really do a blind comparison of speakers, because speaker sound is dependent on room position, and you can't put two speakers in one place at the same time. But I recommend you take a look at the article on Harman's listening tests that I linked to above. If you can't do your own DBTs, you can at least benefit from others'.

I think there's a danger in relying on reviewers because "I agreed with them in the past." First, many audiophiles read reviews that say, "Speaker A sounds bright." Then they listen to Speaker A, and they agree that it sounds bright. But were they influenced in their judgment by that review? We can't say for sure, but there's a good probability.

Second, supposed we do this in reverse. You listen to Speaker A, and decide it sounds bright. Then you read a review that describes it as bright. So you're in agreement, right? Not necessarily. A 1000-word review probably contains a lot of adjectives, none of which have very precise meanings. So, sure, you can find points of agreement in almost anything, but that doesn't mean your overall impressions are at all in accord with the reviewer's.

Finally, if you're interested in speakers, I highly recommend picking up the latest issue of The Sensible Sound, which includes a brilliant article by David Rich about the state of speaker technology and design. It's a lot more of a science than you think. The article is not available online, but if your newsstand doesn't have it (it's Issue #106) you can order it online at www.sensiblesound.com. Believe me, it is worth it.
You guys are missing an important point, double-blind testing is used to determine if there is an audible difference between two components. Things like cables and amps usually will (for cables always will) not show a difference.

If there is a difference, you wouldn't use dbt to decide which to choose.

steve
Pabelson and wattsboss, I agree with both of you as my first posting would suggest. I am getting on with my search for a better speaker than the twenty or so that I have tried thus far, and I cannot imagine how DBTesting would help me at all in this quest.

In science we are interested in testing hypotheses to move along human understanding. In engineering we are seeking to apply what is known, limited though it may be. Audio is an engineering problem and there is no one right way to come up with the best speaker. When validly applied experiments using blinds are useful for excluding alternative hypotheses. This is not a science, however.

Also, while I read reviews, it is usually of those whose opinions I have learned to value because my replications of their work has reached the same conclusions. I fully realize that their testing is sharply restricted by the limited time and setups they have. If my testing yields results I like, whether or not I am delusional, I buy and am happy. I suspect that others would share my conclusions, but it is not a big deal if they do not.
Wattsboss - We should not test anything well because it would be impractical to test everything well?
Agree with your thoughts that there could always be doubts about one single test and tester. I think if we could get folks to care about doing meaningful tests though, it would be a start into improving the hobby (and devaluing the snake-oil).
Wattsboss: I'd be careful about accusing others of naivete, if you're going to make posts like this. In a DBT, everything except the units under test are kept constant. So, for exampe, if you were comparing CD players, you would feed both to the same amp, and on to the same speakers. You wouldn't have to "blind" the associated components, because the associated components would be the same.
my take on dbt is this, if your talking about running db tests with amps,preamp's, source's & speaker's what's the point,its about what sound's "right" to each person & no amount of testing can show who like's what better, i too hate the word synergy but it's a real thing.

now if were talking about db tests on thing's like gear that has been "upgraded internaly" being db tested against a stock model or exotic cables against regular wire there is alot of merit to a db test, i would also think db test's would be great for alot of the thing's in our hobby that are deemed 'snake oil' like clock's & jar's of rock's & especially interconnect's & wire's.

you cant just dismiss all db test's as inconclusive or worthless nor can you say all db test's are worthy.

mike.
One question: let's say we get Double-blind testing, would the associated components also be tested blind? ...
So let's see: say we are testing speakers: we should double-blind what two different amplifiers, tube and solid state; two different levels of power for the amplifiers? ... should we double-blind for the room as well ... I think folks are naive about how many variables are at stake in trying to make audio reviewing and hearing more "precise" and "scientific" than it ever could be.
But le't suppose we did all this: I submit that people still would question the integrity of reviewers because people would still disagree on the quality of the sound they hear. And some among us would swear that reviewer X was on the take.
If you find this fascinating, Qualia8, then maybe you're the one who should be taking these sugar pills.

Obviously I agree with you, since you agree with me. There's a lot of expectation bias (aka, placebo effect) and confirmation bias (looking for--and finding--evidence to support your prior beliefs) in hearing perception. But I suspect some high-enders would rather sacrifice the retirement fund than admit that they might be subject to these mechanisms.

To your last point, it is NOT all ruined for me. I can spend my time auditioning speakers, trying to optimize the sound in my room, and seeking out recordings that really capture the ambience of the original venue.
Double blind testing is the ONLY way to test something fairly to remove human preconception, expectation, and visual prejudice. That is why it is used for drug trials, and that is why it should be used for hifi.

Any audiophile who questions whether DBT can produce the most accurate results within the other constraints (time/partnering equipment) of a shootout is not helping advance audio. But then I think most of us here would secretly agree that audio is a hobby with more than its share of snake-oil salesmen.
This Rouvin-Pableson exchange is fascinating. I agree with Pableson on just about everything. Perhaps that is because I'm an academic (I'm a philosopher, but I'm also part of the cognitive science faculty b/c of my courses on color and epistemology). Anyway, I'm no psychologist, but I am aware of the powerful external forces shaping perceptual evaluation. So I am especially leery of those extra-acoustical mechanisms, which are, by their very nature hidden from us.

SOME RELEVANT PSYCHOLGOICAL MECHANISMS TO BEAR IN MIND.

To start with, there's the endowment effect. The experiment takes place at a three-day conference. At the beginning of the conference, everyone is given a mug. At the end of the conference, the organizers offer to buy the mugs back for a certain price. Turns out, people want something like $8 (can't remember the exact number) to give their mug back. But other groups at different conferences are not given the mug; it is sold to them. Turns out, the price they are willing to *pay* for the mug is, like $1. Conclusion: people very quickly come to think the things they have are worth more than things they don't have, but could acquire.

This may seem to run counter to our constant desire to swap out and upgrade in search of perfect sound, but it explains the superlatives that people use -- "best system I've ever heard," "sounds better than most systems costing triple"-- when describing mediocre systems they happen to own. (Other explanations for this are also possible, of course.)

Our audiophiliac tendencies are also in part explained by the "choice" phenomenon: when you are faced with a wide variety of options, you're not as happy with any of them as you otherwise would be. When subjects are offered three kinds of chocolate on a platter, they're pretty happy with their choice. But when they're offered twenty kinds, they're less happy even when they pick the identical chocolate. That's us!

Another endowment-like effect, though, and this is what got me to write this post, is one that happens after making a purchasing or hiring decision. After making the decision say, to hire person A over person B, a committee will rate person A *much* higher than prior to the hiring decision, when person B was still an option. In other words, we affirm our choices after making them.

This phenomenon is more pronounced the more sacrifices you make in the course of the decision-making process. In other words, if you went all out to get candidate A, you'll think he's even better. Women know this intuitively. It's called playing hard to get.

In the audio realm, when you spend a couple grand on cables, your listening-evaluation mechanisms will *make* the sound better, because you have sacrificed for it.

So *this* made me wonder whether really expensive cables *do* sound better, to those who know what they cost and who made the sacrifice of buying them. If so, then those cables are worth every penny to those who value that listening experience. DBT cannot measure this difference, because it's not a physical difference in the sound. But it is still a *real* difference in the perceptual experiences of the listener. In the one case (expensive cables), your perceptual system is all primed and ready to hear clarity, depth, soundstage, air, presence, and so on. In the other case (cheap cables), you perceptual system is primed to hear grain, edge, sibilance, and so on. And hear them you do!

Best of all would be forgeries, *faked* expensive cables your wife could buy, knowing they were fakes, and stashing the unspent thousands in a bank account. You'd get to "hear" all of this wonderful detail, thinking you were broke, but years later, you'd have a couple hundred grand in your retirement fund!

Sorry for the rambling post, but I am interested to hear what Pableson has to say. You are missing out, Pableson. Knowing about the extra-acoustical mechanisms, you cannot "hear" the benefits of expensive cables. It's all ruined for you, as if you discovered your "wonderful" antidepressants were just pricey sugar pills.
"We can't measure sound and make predictions about how it will sound to you, because how it will sound to you depends on too many factors besides the actual sound." This is what I have been saying in addition to less than favorable comments about many subjective reviews. This problem is one of many that equally hampers "objective" reviews.

A closer reading of what I have written would reveal that I am, at best, ambivalent about the whole process of audio reviewing, subjective or objective. Moreover, DBT has yet to produce much of significance beyond some people can sometimes tell under some conditions.

"And the idea that you, an amateur audio hobbyist without even an undergraduate degree in psychology, has any standing to declare what is and is not valid..." You have constructed a total absence of "valid" credentials for me, an exercise in "creative writing." "Lack of standing" is the problematic judgment invoked that leads you to invalidate experience, a source of needless angry conflict. It might be somewhat accurate to characterize me as "an amateur audio hobbyist," but you have taken quite a leap to decide that I am "without even an undergraduate degree in psychology," a leap that could not be more inaccurate. At least, you didn’t mention my lack of teeth and eviction from the trailer park.

There are subjective reviews in just about every field. Anyone that takes them for hard fact is not understanding what they are. I’d also suggest an advanced readings in sensation and perception to better understand the distinction between "what we are able to perceive" and "how we perceive it."

So, I’ll leave you with two thoughts apropos of this discussion. Einstein said, "Not everything that can be counted counts; and not everything that counts can be counted." In Alice in Wonderland, the Dodo said, "Everybody has won, and all must have prizes." Where's Rodney King, anyhow?
Rouvin: There really isn't much point in arguing with someone who assumes his conclusions, and then does nothing but repeat his assumptions. Here's what I mean:

The majority of what we are able to perceive is not amenable to measurement that can be neatly, or even roughly, correlated with perception.

How do you know what you are *able* to perceive (as distinct from what you *think* you perceive)? In the field of perceptual psychology, which is the relevant field here, there are standard, valid ways of answering that question. But it's a question you are afraid to address. Hence your refusal of my challenge to actually conduct a DBT of any sort. And the idea that you, an amateur audio hobbyist without even an undergraduate degree in psychology, has any standing to declare what is and is not valid as a test of hearing perception is pretty risible.

Finally, just to clear up your most obvious point of confusion: There is a difference between "what we are able to perceive" and "how we perceive it." You are conflating these two things, again because you don't want to face up to the issue. "What we are able to perceive" is, in fact, quite amenable to measurement. It's been studied extensively. There are whole textbooks on the subject.

Your harping on subjective reviewing, by contrast, is about "how we perceive it." We can't measure sound and make predictions about how it will sound to you, because how it will sound to you depends on too many factors besides the actual sound. That's why we need DBTs--to minimize the non-sonic factors. And when we minimize those non-sonic factors, we discover that much of what passes for audio reviewing is a lot of twaddle.
Pableson, I find your posts interesting though not really responsive to the initial thread by TBG about the place of DBT in audio. Nor, have I felt that your posts have been responsive to my similar concerns and additional concerns about experimental validity (though I am sure that not all have been invalid), for instance, the very interesting and amusing 1984 BAS article where the Linn godfather not only failed to differentiate the analog from the digitally processed source, he identified more analog selections as digital. But... this was an atypical setup that would not be found in any home. We can’t really generalize from this, and this has nothing to do with advocacy of the "subjectivist" viewpoint. If you would be true to your objectivist bona fides, wouldn't you have to agree?

Then, there’s the issue, supported by your citations, that there have been DBT’s going back years that have demonstrated noticeable differences between individual components.

So, I think there is a background issue, and this was also mentioned in TBG’s initial post. Many adherents of DBT seem to be seeking the very "conformance" that you want to point out in others. That "conformance?" That until the very qualities claimed to exist can be proven to exist they must be assumed not to exist. Intoxicating argument, but ultimately revealing of a distinct bias, the invalidation of the experience of others as an a priori position until they can meet your standard. This "you ain't proved nothin'" approach is especially troublesome when one reads subjective reviews and realizes that the points they raise, creative writing they may well be, could never be addressed by DBT, ABX, or any other similar methodology. The majority of what we are able to perceive is not amenable to measurement that can be neatly, or even roughly, correlated with perception. To claim otherwise is an illusion. Enter the artists with some scientific and technical skill and we have high end audio. Sadly, with them come the charlatans and deluded along with average and "golden eared" folks who hope that they can hear their music sound a bit more like they think they remember it sounding somewhere in the past. Add something like cables and it seems the battle lines are drawn.

I’m a bit suspicious that you might not allow the person who can reliably detect a difference between two components to write whatever he wants in your forthcoming journal. You claim that once the DBT is passed, he can describe a component any way he wants. It doesn’t really make sense to me because a "just noticeable difference" is not the same as being able to notice all of the differences subjective reviewers claim, does it? If someone can tell the real Mona Lisa from a reproduction, even a well executed one, do you really care to hear about everything else he thinks about it? I don’t. I might want to see it myself, though.

I don’t think there will ever be anything like being able to recreate the exact sonic experience of a live musical performance in a home or studio. What we can hope for are various ways to recreate some reasonable semblance of some aspects of some performances. DBT probably has a place there.

In the meantime, I’d like to suggest a name for your journal, The Absolutely Absolute Sound. I think Gunbei has a supply blindfolds.
Agaffer: A list of DBT test reports appears here:

http://www.provide.net/~djcarlst/abx_peri.htm

This list is a bit old, but I don't know of too many published reports specifically related to audio components since then. After a while, it became apparent which components were distinguishable and which were not. So nobody publishes them anymore because they're old news.

Researchers still use them. Here's a test of the audibility of signals over 20kHz (using DVD-A, I think):

http://www.nhk.or.jp/strl/publica/labnote/lab486.html

The most common audio use of DBTs today is for designing perceptual codecs (MP3, AAC, etc.). These tests typically use a variant of the ABX test, called ABC/hr (for "hidden reference"), in which subjects compare compressed and uncompressed signals and gauge how close the compressed version comes to the uncompressed.

Finally, Harman uses DBTs in designing speakers. Speakers really do sound different, of course, so they aren't using ABX tests and such. Instead, they're trying to determine which attributes of speakers affect listener preferences. The Harman listening lab (like the one at the National Research Council in Canada, whose designers now work for Harman) places speakers on large turntables, which allows them to switch speakers quickly and listening to two or more speakers in the same position in the room. Here's an article about their work:

http://www.reed-electronics.com/tmworld/article/CA475937.html

And just for fun, here's a DBT comparing vinyl and digital:

http://www.bostonaudiosociety.org/bas_speaker/abx_testing2.htm

I think Stan Lipshitz's conclusion is worth noting:

Further carefully-conducted blind tests will be necessary if these conclusions are felt to be in error.
I got no problem being blindfolded for two weeks solid as long as you point me in the general direction of the porcelain amp stand when need be.
Hmm, one of you speaks as if you have participated in and/or has seen the results from many audio DBT tests. Where are these test held? Where are they reported?
the golden-eared: an anecdote

i am a glenn gould fan. according to his biographers, gould could reliably distinguish between playback devices (blind) in the studio, which were indistinguishable to everyone else involved in the studio. gould was special in many ways. it wouldn't surprise me if the anecdote were true.

however, i'm not glenn gould. i'll spend my money on components that are distinguishable by ordinary folks like me.
Rouvin, bingo! Validity is the missing concern with DBTs. I also entire subscribe to your question about where DBTing fits into the reviews that audiophiles want. As I have said, I cannot imagine a DBT audio magazine.

I am troubled by your comments that some DBTing has given positive results. Can you please cite these examples?

In your hypothetical magazine, after DBT establishes that the Mega Whopper is distinguishable from El Thumper Grande, how would either be described? Would there be a DBT for each characteristic?

strawman argument, the only point of DBT is to determine there is an audible difference. If the there is, let the creative writing begin.

steve
Rouvin: You're the one who says these are badly implemented tests (though you seem to be familiar with only a few). I wouldn't claim they're perfect. But that doesn't make their results meaningless; it leaves their results open to challenge by other tests that are methodologically better. My point is that you can't produce any tests that are both 1) methodologically sound; and 2) in conformance with what you want to believe about audio. And until you do produce such tests, you haven't really got any ground to stand on.

You state that golden ears exist, but at the end of the paragraph you admit that this position is indefensible, so you saved me the trouble. ;-) To your point that these golden ears get averaged out in a large test, you're simply wrong. I've never seen a DBT where individuals got a statistically significant score, but the broader panel did not. When it happens, then we'll worry about it.

So, my position remeains that there is surely a place for DBT testing, but even after all the methodological and sampling issues were addressed, I'm still unsure how it fits into the types of reviews most audiophoiles want.

They may not fit with what audiophiles want, but that says more about audiophiles than it does about DBTs.

In your hypothetical magazine, after DBT establishes that the Mega Whopper is distinguishable from El Thumper Grande, how would either be described? Would there be a DBT for each characteristic?

Once you pass the test, you can describe the Thumper any way you want.
Pabelson,
I think we may be closer than you think on this issue but you seem to want it both ways, a difficulty I see repeatedly in "objectivist" arguments. You say:
" For badly implemented tests, they've yielded remarkably consistent results, both positive and negative." --
all the while insisting on scientific assessment.

Methodologically unsound experiments yeild no meaningful results. The pattern of meaningless results does not matter. Your argument in this regard is emotionally appealing, but it is incorrect.

Moreover, the notion that "DBTs address a prior question: Are two components audibly distinguishable at all?" is also suspect absent appropriate methodology. I notice in your posts that you address reliability and repeatibility, important factors without any doubt. Yet you have never spoken to the issue I have validity, and this is the crux of our difference. Flawed methodology can yield repeatable results reliably, but it is still not valid.

And, of course, as you have noted, many DBT's have shown that some components are distinguishable.

The issue beyond methodology, I suspect, is that there are some people who can often reliably distinguish between components. They are outliers, well outside the norm, several standard deviations beyond the mean, even among the self-designated "golden eared." When any testing is done on a group basis, these folks vanish in the group statistics. You can assail this argument on many grounds. It is indefensible except for the virtual certainty that there is a standard distribution in the populatiuon in acuity of hearing.

So, my position remeains that there is surely a place for DBT testing, but even after all the methodological and sampling issues were addressed, I'm still unsure how it fits into the types of reviews most audiophoiles want.

In your hypothetical magazine, after DBT establishes that the Mega Whopper is distinguishable from El Thumper Grande, how would either be described? Would there be a DBT for each characteristic?

Freud had a book on religion entitled "Future of an Illusion" and you may well feel that this is where all of this ultimately is. I'm not sure that I have an answer to that, but this may well be why Ausio Asylum has devclared itself a DBT free zone.
Rouvin: Let me take your two points in order. First:

One, that most DBT tests as done in audio have readily questionable methods – methods that invalidate any statistical testing, as well as sample sizes that are way too small for valid statistics.

Then why is it that all published DBTs involving consumer audio equipment report results that match what we would predict based on measurable differences? For badly implemented tests, they've yielded remarkably consistent results, both positive and negative. If the reason some tests were negative was because they were done badly, why hasn't anyone ever repeated those tests properly and gotten a positive result instead? (I'll tell you why--because they can't.)

Two, and the far more important point to me, do the DBT tests done or any that might be done really address the stuff of subjective reviews?

DBTs address a prior question: Are two components audibly distinguishable at all? If they aren't, then a subjective review comparing those components is an exercise in creative writing. You seem to be making the a priori assumption that if a subjective reviewer says two components sound different, then that is correct and DBTs ought to be able to confirm that. That's faith, not science. If I ran an audio magazine, I wouldn't let anyone write a subjective review of a component unless he could demonstrate that he can tell it apart from something else without knowing which is which. Would you really trust a subjective reviewer who couldn't do that?
Pabelson, interesting challenge, but let’s look at what you’ve said in your various posts in this thread. I’ve pasted them without dates, but I’m sure that you know what you’ve said so far.
"What advances the field is producing your own evidence—evidence that meets the test of reliability and repeatability, something a sighted listening comparison can never do. That’s why objectivists are always asking, Where’s your evidence?"
"A good example of a mix of positive and negative tests is the ABX cable tests that Stereo Review did more than 20 years ago. Of the 6 comparisons they did, 5 had positive results; only 1 was negative."
"It's better to use one subject at a time, and to let the subject control the switching."
"Many objectivists used to be subjectivists till they started looking into things, and perhaps did some testing of their own."

You cite the ABX home page, a page that shows that differences can be heard. Yet I recognize that the differences when heard were between components that were quite different and usually meeting the standard you’ve indicated as much better specs will sound better.

Once you decide something does sound different, is this what you buy? Is different better? You say:
"Find ANYBODY who can tell two amps apart 15 times out of 20 in a blind test (same-different, ABX, whatever), and I’ll agree that those two amps are sonically distinguishable."
Does that make you want to have this amp? Is that your standard?

One of the tests you cite was in 1998 with two systems that were quite different in more than price. Does that lend credence to the DBT argument? On the one hand you point to all the same but one component with one listener with repeated tests but then cite something quite different to impugn subjectivists – not that it’s all that hard to do. You also cite a number of times that DBT has indicated that there is a difference. Which is it? Is there “proof” of hearing differences that has been established by DBT? It certainly appears that there is from the stuff you have cited. By your argument, if this has been done once, the subjectivists have demonstrated their point. I don’t agree, and you really don't appear to , either.

My points were two, and I do not feel that they have been addressed by your challenge. One, that most DBT tests as done in audio have readily questionable methods – methods that invalidate any statistical testing, as well as sample sizes that are way too small for valid statistics. Those tests you cite in which differences were found do look valid, but I haven’t taken the time to go into them more deeply. Two, and the far more important point to me, do the DBT tests done or any that might be done really address the stuff of subjective reviews? I just don’t see how this can be done, and I’m not going to try to accept your challenge , “If you know so much ...” Instead, if you know so much about science and psychoacoustics, and you do appear to have at least a passing knowledge to me, why would you issue such a meaningless, conversation stopper challenge? Experiments with faulty experimental design are refused for journal or other publication all the time by reviewers who do not have to respond to such challenges. The flaws they point out are sufficient.

Finally, I’ve been involved in this more than long enough to have heard many costly systems in homes and showrooms that either sounded awful to my ears or were unacceptable to me one way or another. The best I’ve heard have never been the most costly but have consistently been in houses with carefully set up sound rooms built especially for that purpose from designs provided by psychoacoustic objectivists. This makes me suspect that what we have is far better than we know, a point inherent in many "objectivist" arguments. My home does not even come close to that standard in my listening room (and a very substantial majority of pictures I see of various systems in rooms around the net also seem to fall pretty short). The DBT test setups I have seen have never been in that type of room, either. What effect this would have on a methodologically sound DBT would be interesting. Wouldn’t it?
So, Rouvin, if you don't think all those DBTs with negative results are any good, why don't you do one "right"? Who knows, maybe you'd get a positive result, and prove all those objectivists wrong.

If the problem is with test implementation, then show us the way to do the tests right, and let's see if you get the results you hope for. I'm not holding my breath.
DBT as done in audio has significant methodological issues that virtually invalidate any results obtained. With improper experimental design methodology, any statistics generated are suspect. Regularly compounding the statistical issues is sample size, usually quite small, meaning that the power of any statistics generated, even if significant, is quite small, again meaning that the results are not all too meaningful. Add to this the criticism that DBT, as done so far in audio, might be introducing its own set of artifacts that skew results, and we have quite a muddle.

I'm not at all opposed to DBT, but if it is to be used, it should be with a tight and valid experimental design that allows statistics with some power to be generated. Until this happens, DBT in audio is only an epithet for the supposed rationalists to hurl at the supposed (and deluded) subjectivists. Advocates of DBT have a valid axe to grind, but I have yet to see them produce a scientifically valid design (and I am not claiming an encyclopedic knowledge of all DBT testing that has been done in audio).

More interestingly, though, what do the DBT advocates hope to show? More often than not, it seems to be that there is not any way to differntiate component A (say, the $2.5K Shudda Wudda Mega monster power cord) from component B (stock PC)or component group A (say, tube power amps)from component group B (transistor power amps). Now read a typical subjectivist review waxing rhapsodically on things like, soundstage width and height, instrumental placement, micro and macrodynamics, bass definition across the sepctrum, midrange clarity, treble smoothness, "sounding real," etc., etc. Can any DBT address these issues? How would it be done?

You might peruse my posts of 8/13/05 and 8/14/05 about a power cord DBT session, carried out, I think, by a group that were sincere but terriby flawed in how they approached what they were trying to do to get an idea of how an often cited DBT looks when we begin to examine critically what was done.

http://forum.audiogon.com/cgi-bin/fr.pl?fcabl&1107105984&openusid&zzRouvin&4&5#Rouvin
Agaffer, I agree. I have participated in DBTs several times and have found hearing differences in such short term to be difficult, even though after a long term listening to several of the units, I clearly preferred one.

I think the real question is why do short-term comparisons with others yield "no difference" results while other circumstances yield "great difference" results. Advocates of DBT say, of course, that this reveals the placebo effect in the more open circumstances where people know what unit is being played. I think there are other hypotheses, however. Double blind tests over a long term with no one else present in private homes would exclude most alternative hypotheses.

The real issue, however, is whether any or many of us care what these results might be. If we like it, we buy it. If not, we don't. This is the bottom line. DBT assumes that we have to justify our purchases to others as in science; we do not have to do so.
I have a huge problem with the concept of DBT, with regards to trying to determine the differences or lack there of, with audio products. Maybe I'm just slow but, I often have to live a piece of gear for awhile before I can really tell what it can and cannot do.
DBT is great for something like a new medicine. However, it would be worthless if you gave the subjects one pill, one time. The studies take place over a period of time. And that is the problem with DBT in audio. You sit a group of people in front of the setup. They listen to a couple of songs, you switch whatever component and then play a couple of songs. That just doesn't work. The differences are often very subtle and can't be heard a first.
Which, of course, is the dilemma of making a new purchase. You have to base your decision on short listening periods.
The concept of a DBT for a audio component is great. But, I have yet to see how a test would be set up that would be of any value. Looking a test results based on swapping components after short listening periods would never influence my buying decisions. I wouldn't care how large the audience was or how many times it was repeated. Anymore than I would trust a new drug that was conducted with a one pill dose.
Post removed 
I find this a very interesting topic.

On one hand, it is somewhat accepted that the perfect component imposes no sonic qualites of it's own on the passing signal, but yet voicing of components is often referred to - particularly in the case of cables.

So, if a component is purposely voiced, then the reproduction cannot be true to the source can it? Further, if the differences are so obvious as many anecdotally state, it should be no problem to pass BT, DBT, or ABX tests...
My apologies. I took you for the typical DBT-basher. As for amps, assuming you are talking about solid-state amps designed to have flat frequency response, I seriously doubt it matters (in a DBT) what preamp you use, or how expensive it is. If it has the power to drive your speakers, it will sound like any other good solid state amp with enough power to drive your speakers. Or so the bulk of the research suggests.

To your final point, I'm not sure what's in Mr. Porter's system, but the Nousaine experiment at least suggests that he would NOT notice such a swap, assuming you could finesse the level-matching issues. That's not to say that Mr. Porter's system is not right for Mr. Porter--merely that it might be possible for someone else to achieve a Porter-like sound for somewhat less money. And swapping out amps and cables is one thing; I wouldn't even dream of touching his turntable!
In theory, I like the idea of double blind testing, but it has some limitations as others have already discussed. Why not play with some other forms of evaluating equipment?

My first inclination would be to create a set of categories; such as dynamics, rythm and pace, range, detail, etc.. You could have a group of people listen and rate according to these attributes on a scale of perhaps 1 to 5. You could improve the data by having the participants not talk to one another before completing their ratings, by hiding the equipment from them during the audition, and by giving them a reference audition where pre-determined ratings are provided from which the rater could pivot up or down across the attributes.

Yet another improvement would be to take each rating category and pre-define its attributes. For example, ratings for "detail" as a category could be pre-defined as: 1. I can't even differentiate the instruments and everything sounds like a single tone. 2. I can make out different instruments, but they don't sound natural and I cannot hear their subtle sounds or noises. 3. Instruments are well differentiated and I can hear individual details such as fingers on the fret boards and the sound of the bow on the violin string. Well, you get the picture. The idea is to pre-define a rating scale based on characteristics of the sound. Notice terms such as lush or analytical are absent because they don't themselves really define the attribute. They are subjective conclusions. Conceivably, a blend of categories and their attributes could communicate an analysis of the sound of a piece of equipment, setting aside our conflicting definitions about what sounds 'best', which is very subjective. Further, such a grid of attributes, when completed by a large number of people, could be statistically evaluated for consistency. Again, it wouldn't tell you whether the equipment is good or bad, but if a large number of people gave "detail" a rating of #2 and you had a low deviation around that rating, you might get a good idea of what that equipment sounds like and decide for yourself whether those attributes are desireable to you or not. Such a system would also, assuming their were enough participants over time, flush out the characteristics of equipment irrespective of what other equipment it was used with by relying upon a large volume of anecdotal evidence. In theory, the characteristics of a piece of equipment should remain consistent across setups or at least across similar price points.

Lastly, by moving toward a system of pre-defined judgements one could create some common language to rating attributes. Have you noticed that reviewers tend to use the same vocabularly whether evaluating a $500 piece of gear or a $20,000 piece of gear. So, the review becomes judgemental and loses its ability to really place the piece of gear in the spectrum of its possible attributes.

It's not a double blind study, but large doses of anecdotal evidence when statistically evaluated can yield good trend data.

Just an idea for discussion. If you made it this far, thanks for reading my rant :).

Jeff
My point was not to call into question the efficacy of blind testing. I am quite in favor of it. Even when only one element of a system is varied, the results are interesting, and valuable. For instance, if I can pairwise distinguish speakers (blindly) of $1K and $2K, but not be able to distinguish similarly priced amps, or powercords, or what have you, then my money is best spent on speakers. Likewise, if preamps are more easily distinguishable than amps, I'll put my money there. A site that's interesting in this regard is:

http://www.provide.net/~djcarlst/abx_data.htm

I never said DBT is ineffective. It's just that *most* testing ignores the phenomenon that I cited: sameness of sound is intransitive, i.e., a=b,b=c, but not a=c. If the question is whether a certain component contributes to the optimal audio system, this phenomenon can't be ignored.

Of course scientists studying psychoacoustics are already aware of the phenomenon. I don't think I'm making a contribution to the science here. But the test you cite above is an exception, and for the most part, A/B comparisons are done while swapping single components, not large parts of the system. This is fine, when you *do* discover differences. Because then you know they're significant. But when you don't find differences, it's indeterminate whether there are no differences to be found OR the differences won't show up until other similar adjustments are made elsewhere in the system.

But I am *very much* in favor of blind testing, even in the pair-wise fashion. For instance, I want to know what the minimum amount of money is that I could spend to match the performance of a $20K amp in DBT. Getting *that* close to a 20K amp would be good enough for me, even if the differences between my amp and it will show up with, say, simultaneously swapping a $1K preamp with a $20K preamp. So where's that point of auditorily near-enough for amps?

I've also learned from DBT where I want to spend my extremely limited cash: speakers first, then room treatment, then source/preamp, then amp, then ic's and such. I'll invest in things that make pair-wise (blind) audible differences over (blind) inaudible differences any day.

Still, for other people here, who are after the very best in sound, only holistic testing matters. Their question (not mine) is whether quality cabling makes any auditory difference at all, in the very best of systems. Same for amps.

Take a system like Albert Porter's. Blindfold Mr. Porter. If you could swap out all the Purist in his system and put in Radio Shack, and *also* replace his amps with the cheapest amps that have roughly similar specs, without his being able to tell, that would be very surprising. But I haven't seen tests like that... the one you mention above excepted.
Troy: Psychoacoustics is well aware of the possibility that A does not necessarily equal C. That hardly constitutes a reason to question the efficacy of DBTs.

And you are quite correct that changing both your speaker cables and interconnect(s) simultaneously might make a difference, when changing just one or the other would not. But assuming you use proper level-matching in your cable/wire comparisons, there probably won't be an audible difference, no matter how many ICs you've switched in the chain. (And if you don't use proper level-matching in your cable/wire comparisons, you will soon be parted from your money, as the proverb goes.)

You might be interested to know that Stereo Review ran an article in June 1998, "To Tweak Or Not to Tweak," by Tom Nousaine, who did listening tests comparing two very different whole systems. (The only similarities were the CD player and the speakers, but in one system the CD player fed an outboard DAC.) The two systems cost $1700 and $6400. The listening panel could hear no difference between the two systems, despite differences in DACs, preamps (one a tube pre), amps, ICs, and speaker cables.

So, contrary to your assertions, this whole question has been studied, and there is nothing new under the sun.
I teach a course on the philosophy of color and color perception. One of the things I do is show color chips that are pairwise indistinguishable. I show a green chip together with another green chip that is indistinguishable. Then, I take away the first chip and show a third green chip that is indistinguishable from the second. And then I toss the second chip and introduce a fourth chip, indistinguishable from the third. At this point, I bring back the first green chip and compare it with the fourth. The fourth chip now looks bluish by contrast, and is easily distinguished from the original. How does that happen? We don't notice tiny differences, but they add up to noticable differences. We can be walked, step-wise, from any color to any other color without ever noticing a difference, provided our steps are small enough!

Same for sound, I bet. That's why I don't understand the obsession with pair-wise double-blind testing of individual components. Comparing two amps, alone, may not yield a discriminable difference. Likewise, two preamps might be pairwise indiscriminable. But the amp-pre-amp combos (there will be four possibilities) may be *noticably* different from one another. I bet this happens, but the tests are all about isolating one component and distinguishing it from a competitor, which is exactly wrong!

The same goes for wire and cable. It may be difficult to discern the result of swapping out one standard power cord or set of ic's or speaker cables. But replace all of them together and then test the completely upgraded set against the stock setup and see what you've got. At least, I'd love to see double-blind testing that is holistic like this. I'd take the results very seriously.

From the holistic tests, you can work backward to see what is contributing to good sound, just as you can eventually align all color chips in the proper order, if presented with the whole lot of them. But what needs to be compared in the first place are large chunks of the system. Even if amp/pre-amp combos couldn't be distinguished, perhaps amp/pre-amp combos with different cabling could be (even though none of the three elements used distinguishable products!). I want to see this done. Double blind.

In short: unnoticable difference add up to *very* noticable differences. Why this non-additive nature of comparison isn't at the forefront of the subjectivist/objectivist debate is a complete mystery to me.

-Troy
Gregm, I do not know how many out there experienced the Murata demonstration at CES 2004, but it was a great deal like what you describe. Initially, the speakers played a passage. Then the super tweeters were used and the passage replayed. The ten people in the audience all expressed a preference for the use of the super tweeters. There was much conversation but ultimately someone asked to hear the super tweeter only. The demonstrator said, we already were hearing it.

When we all refocused on the sound, all that we could hear was an occasional spit, tiz, snap. There was no music at all. The Muratas come in at 15k Hz. I left and dragged several friends back for a second demonstration with exactly the same results.

Would there be any benefit to having this done single or double blind? I don't think so. Do we need to have an understanding for how we hear such high frequency information, without which it might be a placebo or Hawthorne Electric phenomenon? I don't.

But this experience is quite at odds with the article that Pabelson cited. What is going on? I certainly don't know, save to suggest that there is a difference in what is being asked of subjects in the two tests.
ELdartford sez
the ear senses RATE-of-change of pressure (...)Have you heard any other explanation
Well, 1) about 20yrs ago a french prof (forgot the name) claimed findings that the bones contribute to our perception of very high frequencies. 2) There seems to be a case for the interaural mechanism working together -- not ONE ear alone, but both being excited.

OTOH, it's also been established that the audibility of PURE tones diminishes with age in the higher frequencies. So here, we're talking about "sound in context": i.e. say, harmonics of an instrument -- where the fundamental & certain harmonics are well within our pure tone hearing range and some of the related info is outside an individual's "official" (pure tone) audible range.

The strange thing is that our ears work as a low pass; so, some people speculate that it's the COMMON interaural excitation that does the trick...
For this to happen (let's ignore the possible contribution of the bone structure for now) would'nt it mean that our interaural "mechanism" is situated in the DIRECT path (sweet spot) of those frequencies (remember, our acuity falls dramatically, ~20-30db, up there). If so, then moving our head slightly would eliminate this perception.

So, let's assume a super high frequency transducer with excellent dispersion characteristics and thereby eliminate the need for that narrow sweet spot (a Murata is quite good, btw).

It is my contention (but I have no concrete evidence) that three things are happening in conjunction:
a) the high frequency sound is loud enough to overcome our reduced acuity up high (at -60db perception our ear would basically reject it)
b) the sounds in our "official" audible frequency range are rendered more palpable (for wont of a better word) because the super transducer's distortion points (upper resonance) have moved very far away (it's ~100kHz for a Murata) -- hence "perception" of positive effects. This still relates to our "official" range of hearing.

b) there is a combined excitation of aural and other, structural, mechanisms that indicate the presence of high frequencies -- that we cannot, however, qualify or explain (our hearing is a defense and guidance mechanism geared towards perceiving and locating).
Even at B there is a dilemma: in a small experiment in France some subjects were asked to put one ear close to a super tweet and declare whether they perceive anything. Inconclusive (some did, some didn't, no pattern. BTW, I did a similar thing & did perceive energy or lack of it with some DELAY however when the tweet STOPPED producing sound -- joining Eldartford's idea).
Subjects were then asked to move away from the transducer & listen normally (stereo), just by casually sitting on a couch in front of the speakers as one would do at home. Everyone "heard" the supertweet playing. Amazingly, only the s-tweet was connected (at 16kHz -- very high up for sound out of other context).
I find this fascinating.
Tbg...I also can "hear" the effect of tweeters/supertweeters operating well above the measured bandwidth of my 67-year old ears. (I first noticed this general effect, at higher frequencies, when I was much younger). My explanation is that the ear senses RATE-of-change of pressure, as well as change of pressure. The high rate of change of a 20 KHz signal can be sensed, even if the smoothly changing pressure of a 14KHz signal is inaudible. The experience we share is common. Have you heard any other explanation?
For those embracing DBT as simple self-endorsement, I am dismissive.

No objectivists of my acquaintance (and I am acquainted with some fairly prominent ones), "embrace DBT as simple self-endorsement." A number of them, myself included, were subjectivists until we heard something that just didn't make sense to us. I know of one guy (whose name you would recognize) who was switching between two components and had zeroed in on what he was sure were the audible differences between them. Then he discovered that the switch wasn't working! He'd been listening to the same component the whole time, and the differences, while quite "obvious," turned out to be imaginary. He compared them again, blind this time, and couldn't hear a difference. He stopped doing sighted comparisons that day.

Research psychologists did not adopt blind testing because it gave them the results they wanted. They adopted it because it was the only way to get reliable results at all. Audio experts who rely on blind testing do so for the same reason.

Final thought: No one has to use blind comparisons if they don't want to. (Truth be told, while I've done a few, I certainly don't use them when I'm shopping for audio equipment.) Maybe that supertweeter really doesn't make a difference, but if you think it does, and you're happy with it, that's just fine. Just don't get into a technical argument with those guys from NHK!
I started the thread because I am curious about those who doubt others' abilities to hear the benefits of some components and wires. As many proponents can point to few examples of DBT and nevertheless seem confident of the results, I assumed that they saw DBT as endorsing their personal beliefs. Furthermore, my personal experiences with DBT same/different setups has been that I too could not be confident that my responses were anything other than random. But my experiences with single blind tests with several components which were compared have been more favorable with a substantial consensus on the surprisingly best component.

Speakers have always been a problem for me. Some are better in some regards and others in other areas. I suspect that within the limits of what we can afford, all of us picks our poison.

I did read you reference article and found it very interesting a troublesome as I use a Murata super tweeter with only comes in a 15k Hz and extends to 100k Hz. I am 66 and have only limited hearing above 15k Hz, yet in a demonstration I heard the benefits of the super tweeter, even though there was little sound and no music coming from the super tweeter when the main speakers were turned off. Everyone else in the demonstration heard the difference also. I know that the common response by advocates of DBT is that we were influenced by knowing when they were on.

I must admit that I am confident of what I heard and troubled by my not hearing a difference in a DBT. Were this my area of research rather than my hobby, I would no doubt focus on the task at hand for subjects in DBTs as well as the testing apparatus. My confidence is still in human ears, and I suspect that this is where we differ. I guess it is a question of the validity of the test.

For a sincere DBTer, such as yourself, I am not being truculent. For those embracing DBT as simple self-endorsement, I am dismissive.