Why is Double Blind Testing Controversial?


I noticed that the concept of "double blind testing" of cables is a controversial topic. Why? A/B switching seems like the only definitive way of determining how one cable compares to another, or any other component such as speakers, for example. While A/B testing (and particularly double blind testing, where you don't know which cable is A or B) does not show the long term listenability of a cable or other component, it does show the specific and immediate differences between the two. It shows the differences, if at all, how slight they are, how important, etc. It seems obvious that without knowing which cable you are listening to, you eliminate bias and preconceived notions as well. So, why is this a controversial notion?
moto_man

Showing 4 responses by bomarc

Drubin's right about double-blind: it means that nobody in the room knows which is which. And researchers use it because they've learned that there are all sorts of ways that someone can subconsciously indicate which is which to whoever is actually doing the comparing. If you want to be absolutely sure that there's no outside influence (intentional or not) and that you're making your decisions based only on the sound, double-blind is essential.

That said, the main reason DBTs are controversial is that they tend to produce results that are at odds with the received wisdom of audiophilia.
Socrates: You've asked a mouthful of questions. I'd suggest you start out with this site: www.pcabx.com, where you can download software that will allow you to conduct your own DBTs.

I don't want to get into statistics, except to say that's usually not the weak link in a DBT. As for the "fallibility of science," that's not the way I'd put it. I'd say that science is never finished, and it can always discover somethng new, or that something once "proven" to be right is in fact wrong. Science, in short, is the best explanation we have right now for whatever phenomenon we wish to explain. But you can't just wish it away. Current knowledge stands as knowledge--as fact--until somebody comes along with new knowledge that refutes it.

That said, anyone--and I mean anyone--who does serious research on either human hearing or sound reproduction uses DBTs--and ONLY DBTs. No one in the scientific community would think of doing a listening test any other way, because such tests are absolutely necessary to isolate and compare only the sound.
Thanks, Rzado, for the refresher course. Let me try to summarize for anyone who fell asleep in class. In a DBT, if you get a statistically significant result (at least 12 correct out of 16 in one of Radzo's examples), you can safely conclude that you heard a difference between the two sounds you were comparing. If you don't score that high, however, you can't be sure whether you heard a difference or not. And the fewer trials you do, the more uncertain you should be.

This doesn't mean that DBTs are hopelessly inconclusive, however. Some, especially those that use a panel of subjects, involve a much higher number of trials. Also, there's nothing to stop anyone who gets an inconclusive result from conducting the test again. This can get statistically messy, because the tests aren't independent, and if you repeat the test often enough you're liable to get a significant result through dumb luck. But if you keep getting inconclusive results, the probability that you're missing something audible goes way down.

To summarize, a single DBT can prove that a difference is audible. A thousand DBTs can't prove that it's inaudible--but the inference is pretty strong.

As for my statement about statistics not being the weak link, I meant that there are numerous ways to do a DBT poorly. There are also numerous ways to misinterpret statistics, in this or any other field. Most of the published results that I am familiar with handle the statistics properly, however.
Rzado: My point on retesting is this: If something really is audible, sooner or later somebody is going to hear it, and get a significant response, for the same reason that sooner or later, somebody is going to flip heads instead of tails. If you keep getting tails, eventually you start to suspect that maybe this coin doesn't have a heads. Similarly, if you keep getting non-significant results in a DBT, it becomes reasonable to infer that you probably (and we can only say probably) can't hear a difference.

As for published studies, the ones I've seen (which may not be the same ones you've seen) generally did get the statistics right. What usually happens is that readers misinterpret those studies--and both sides of The Great Debate have been guilty of that.