Reviews with all double blind testing?


In the July, 2005 issue of Stereophile, John Atkinson discusses his debate with Arnold Krueger, who Atkinson suggest fundamentally wants only double blind testing of all products in the name of science. Atkinson goes on to discuss his early advocacy of such methodology and his realization that the conclusion that all amps sound the same, as the result of such testing, proved incorrect in the long run. Atkinson’s double blind test involved listening to three amps, so it apparently was not the typical different or the same comparison advocated by those advocating blind testing.

I have been party to three blind testings and several “shootouts,” which were not blind tests and thus resulted in each component having advocates as everyone knew which was playing. None of these ever resulted in a consensus. Two of the three db tests were same or different comparisons. Neither of these resulted in a conclusion that people could consistently hear a difference. One was a comparison of about six preamps. Here there was a substantial consensus that the Bozak preamp surpassed more expensive preamps with many designers of those preamps involved in the listening. In both cases there were individuals that were at odds with the overall conclusion, and in no case were those involved a random sample. In all cases there were no more than 25 people involved.

I have never heard of an instance where “same versus different” methodology ever concluded that there was a difference, but apparently comparisons of multiple amps and preamps, etc. can result in one being generally preferred. I suspect, however, that those advocating db, mean only “same versus different” methodology. Do the advocates of db really expect that the outcome will always be that people can hear no difference? If so, is it the conclusion that underlies their advocacy rather than the supposedly scientific basis for db? Some advocates claim that were there a db test that found people capable of hearing a difference that they would no longer be critical, but is this sincere?

Atkinson puts it in terms of the double blind test advocates want to be right rather than happy, while their opponents would rather be happy than right.

Tests of statistical significance also get involved here as some people can hear a difference, but if they are insufficient in number to achieve statistical significance, then proponents say we must accept the null hypothesis that there is no audible difference. This is all invalid as the samples are never random samples and seldom, if ever, of a substantial size. Since the tests only apply to random samples and statistical significance is greatly enhanced with large samples, nothing in the typical db test works to yield the result that people can hear a difference. This would suggest that the conclusion and not the methodology or a commitment to “science” is the real purpose.

Without db testing, the advocates suggest those who hear a difference are deluding themselves, the placebo effect. But were we to use db but other than the same/different technique and people consistently choose the same component, would we not conclude that they are not delusional? This would test another hypothesis that some can hear better.

I am probably like most subjectivists, as I really do not care what the outcomes of db testing might be. I buy components that I can afford and that satisfy my ears as realistic. Certainly some products satisfy the ears of more people, and sometimes these are not the positively reviewed or heavily advertised products. Again it strikes me, at least, that this should not happen in the world that the objectivists see. They see the world as full of greedy charlatans who use advertising to sell expensive items which are no better than much cheaper ones.

Since my occupation is as a professor and scientist, some among the advocates of double blind might question my commitment to science. My experience with same/different double blind experiments suggest to me a flawed methodology. A double blind multiple component design, especially with a hypothesis that some people are better able to hear a difference, would be more pleasing to me, but even here, I do not think anyone would buy on the basis of such experiments.

To use Atkinson’s phrase, I am generally happy and don’t care if the objectivists think I am right. I suspect they have to have all of us say they are right before they can be happy. Well tough luck, guys. I cannot imagine anything more boring than consistent findings of no difference among wires and components, when I know that to be untrue. Oh, and I have ordered additional Intelligent Chips. My, I am a delusional fool!
tbg
Despite someone who claims to be knowledgeable about research methods, you seem woefully insensitive to the need for your measures to validly assess the theoretical concept for which they are supposed to measure. I stead you make very unscientific appeals to authority which is perhaps the worst scientific infraction.

You have demonstrated that there is insufficient tangible data to dismiss criticisms of DBT as inapplicable to questions of what sounds best that could be shared among customers. Until the obvious disparity between what people hear and what DBT shows is resolved, no one is going to make buying decisions based on DBT. Perhaps you do, but I doubt it.

I am off to CES, so I will not be monitoring further useless appeals to authority.
Your evidence and appeals to "what scientists already knew" authority are not the way to make your conclusions broadly accepted.

Rest assured, I have no illusions about the possibility of convincing someone who, despite a complete lack of knowledge about the field, nonetheless feels qualified to assert that a test methodology used by leading experts in the field for decades lacks "face validity."

I'm just demonstrating, to anyone who might be reading this with an open mind, that the people who carp about DBTs in audio threads have neither an understanding of the issue nor a shred of real tangible data to support their beliefs.
As I have said before too many times, were DBTs that were not same or different tasks used and to show no differences, many who viewed this as science would be inclined to accept this as a valid measure of sounding different and better. Same or different questions over brief periods do not give results that have face validity.

Again, this discussion should be laid to rest. Your evidence and appeals to "what scientists already knew" authority are not the way to make your conclusions broadly accepted. Again, were this a matter of what would cure cancer, etc., there probably would be the need to resolve what is an appropriate test, but it is not. As such it is not relevant to discussions on Audiogon or AudioAsylum.
I merely would state that I and many others reject that DBT validly assesses sonic difference among cables, etc. Where is your demonstration of face validity or any demonstration of validity?

Where to begin? First, we can physically measure the smallest stimulus that can excite the otic nerve and send a signal to the brain. It turns out that subjects in DBTs can distinguish sounds of approximately the same magnitude. This shows that DBTs are sensitive enough to detect the softest sounds and smallest differences the ear can detect.

To look at it another way, basic physics tells us what effect a cable can have on the signal passing through it, and therefore on the sound that emerges from our speakers. And basic psychoacoustics tells us how large any differences must be before they are audible. DBTs of cables match this basic science quite closely. When the measurable differences between cables are great enough to produce audible differences in frequency response or overall level, the cables are distinguishable in DBTs. When the measurable differences are not so great, the DBTs do not produce positive results.

That's how validation is done--we check the results of one test by comparing it to knowledge determined in other ways. DBTs of audio components came late to the party. All they really did was to confirm things that scientists already knew.
You said,"As you said, what does it matter to you if scientists say your cables are indistinguishable from zipcord?" I would take that to mean that you meant this.

I merely would state that I and many others reject that DBT validly assesses sonic difference among cables, etc. Where is your demonstration of face validity or any demonstration of validity?

My faculty room was also amused that I had any confidence in experiments, which they view as not isomorphic or generalizable to real life. They are always on my case for approving Psych. proposals that use the force contributions from students taking Psych. courses. They are enamoured with econometric modeling usually assuming that humans are rational. I have never found humans maximize much other than perhaps to take the lazy way out, such as voting the political party they adopted from their parents.
TBG, I submit that your 'better sounding' cables are distinguishable from the simple zip cord, precisely because their frequency response variation is large enough that one can hear the difference in a DBT. A zip cord of sufficient gauge (low resistance) to pass the current necessary to run the speakers and not affect the damping will be much more linear than your 'better sounding' cables. DBT will distinguish between sufficiently different sounding cables, but will also prove when there is no significant difference.
With respect, Bob P.
Who's misrepresenting what, TBG? I never said cables can't sound different. I cited an article earlier that did 6 cable comparisons, and 5 of them turned out positive. I've corrected your misstatements about this previously. Please don't repeat them again.

Just for the record, what DBTs actually demonstrate is that cables are audibly distinguishable only when there are substantial differences in their RLC values. For most cables in most systems, that is rarely the case. Exceptions may include tube amps with weird output impedances, speakers with very difficult impedance curves, and yards and yards of small-gauge cable.

No finding is ever proven rather it is tentatively accepted unless further data or studies using different methodologies suggests an alternative hypothesis.

Exactly. So where's your data? Where are your studies?

Any fool's testing would indicate that is untrue, even if only in sighted comparisons.

The faculty lounge is most amused.
Pabelson, I do wish it would die, but you continue to misrepresent what science is and who best represents it. There is no evidence anywhere, including with your sacred, DBTesting, that demonstrates given evidence that cables don't sound different. You are not the authority who could declare that science has proven something for decades. No finding is ever proven rather it is tentatively accepted unless further data or studies using different methodologies suggests an alternative hypothesis. Robustness also is not of much use except to suggest that replications have often been done.

It is just the case that I will not concede that scientists or anyone has shown my better sounding cables are indistinguishable from zip-cord. Any fool's testing would indicate that is untrue, even if only in sighted comparisons.
TBG: This thread had been put to bed. It was dormant for two weeks. I'm not the one who revived it. And if it doesn't matter to you, why do you keep posting? Go buy your expensive cables, and just enjoy the pleasure they give you. As you said, what does it matter to you if scientists say your cables are indistinguishable from zipcord?
Henry: Go back and re-read the thread. I provided a link to a whole list of articles on DBTs, including tests of cables, amps, CD players, tweaks, etc. That's why I'm on solid ground in demanding that the opponents of DBTs do the same. As of yet, no one has come up with a single experiment anywhere disputing what those tests have shown. Not one.

Science isn't done by arguing about methodology in the abstract. It's done by developing a better methodology and producing more robust results with it. People like Rouvin wouldn't even know how to do that. And the people who do know how to do that aren't doing it, because they have better uses of their time than re-proving something that's been settled science for decades. If you think it isn't settled, then it's up to you to come up with some evidence that unsettles it.
What is laughable in the Psych. Dept. is not authoritative. Data has to be presented to justify that DBT validly assesses sound differences among components. DBT lacks face validity as most can hear differences. You, sir, are the one guilty of scientific error no matter how much you protest that others are pseudoscietific.

But more fundementally, we are not engaged in science in picking wine, cars, clothing, houses, wives, or audio equipment, so Charlie is right. Put this to bed. Neither of us is convincing the other nor ever will.
data is the heart of science.

And this thread, now at over 170 posts, still doesn't contain a shred of reliable, replicable data demonstrating audible differences between components that can't be heard in standard DBTs.

There are many reasons to believe that as applied to audio gear, this methodology does not validly assess the hypothesis that some components sound better.

Name one. Check that. Name one that won't get you laughed out of the Psych Dept. faculty lounge.

You, sir, also have no evidence that is intersubjectively transmissible.

I don't need "evidence that is intersubjectively transmissible," because I'm not changing the subject. The subject is hearing, and what humans can and cannot hear. In order to argue that DBTs can't be used for differences in audio gear, you have to claim that human hearing works differently when listening to audio gear than it does when listening to anything else. That's about as pseudoscientific as it gets.
Pabelson. I think "you have no proof" counter arguement falls flat when you have not provided any yourself. Its one thing to have some DBT results...but what Rouvin is pointing out is such tests in isolation tell us nothing without a substantially larger sample size, statistical significance testing etc....what is the scientific method or what you call empricism (a few isolated DBTS cannot really be considered empirical evidence). Ergo asking reviewers to be subjected to such a test, then provide their normal reviews...equally is misleading.
Pabelson, data is the heart of science. To gather it one has to have operationalizations of the concepts in your hypothesis which involves methodology. Your distinction is not meaningful.

You are always justifying DBTs as often used in perceptual psychology. Such appeals are unscientific appeals to authority. There are many reasons to believe that as applied to audio gear, this methodology does not validly assess the hypothesis that some components sound better.

You, sir, also have no evidence that is intersubjectively transmissible. Furthermore, as I have said repeatedly, I would not care anyway. I buy what I like and need not prove anything to you or others wrapping themselves in the notion that they are the scientists and those who take exception to them are unscientific.
Every scientific field has its own methodology, Rouvin. If you had made an effort to acquaint yourself with the rudiments of perceptual psychology, you'd be in a better position to pontificate on it.

By the way, methodology is NOT at the heart of science. Empiricism is. Methodology is just a means to an end. Empiricism demands reliable, repeatable evidence. You still haven't got any.
Sadly, anyone who views looking at methodological issues as quibbling does not understand that methodology is at the heart of science. Boring, tedious, surely, but a necessary condition to call something science. Your question, "Why in the world would you need a control group in a perception test?" reveals that you need to brush up on your basic science.
For a guy who doesn't believe in intelligent design, Rouvin, you practice its methods to perfection. You offer no evidence of your own--no tests, no results, nothing that can be replicated or disproved. Instead, you quibble with the "methodology," which you seem substantially uninformed about ("e.g., lack of random assignment or no control groups, making these experiments invalid scientifically"--Why in the world would you need a control group in a perception test?)

We are speaking different languages, Rouvin. DBT advocates are speaking the language of science. You are not.
Rouvin, I substantially agree, of course. I agree moreover about the liabilities of publish or perish in academia and its effect on research, even though I am in a field with no commercial interests, other than public polling.

I do study public policy also, including the impact of creationism or intelligent design as it is now called. It is awkward to get good state data on science degrees issued before and after adoption of anti-evolution policies, but the worst states in terms of failing to teach evolution have not experienced a decline in science degrees. They never had many in Kansas, for example. It is much like abortion restrictions, the states that adopt such restrictions are those with few abortions and experience no decline thereafter. Where abortion is common, no politician would risk introducing a restriction or voting for one.

I too have been struck by why those advocating DBT seem to think that anyone need bother paying attention to results when buyers obviously hear a difference which causes them to buy. Anyone who trusts reviewers other than to suggest what you might want to give a listen, are bound to be disappointed.
I’ve gone over this debate and would like to summarize many of the points made.

As to DBT there may be:
1. Problems with methodology, per se, in audio;
2. Problems with most DBT that has been done, e.g., lack of random assignment or no control groups, making these experiments invalid scientifically;
3. Problems with particular experiential designs that are unable to yield meaningful results;
4. Sample problems, such as insufficient sample size, non-random samples;
5. Statistical problems making interpretation of results questionable.

All of these problems interact, making the results of most DBT’s in audio scientifically meaningless.

Advocates of DBT have been especially vociferous in this forum, but what have they actually said to respond to these criticisms? Virtually nothing beyond "No!" or "Where’s your proof?"

The "proof" of their position cited has been interesting, but it has been a reporting on the power of "sham" procedures or other stories that do not meet the guidelines necessary for a DBT procedure to qualify as science.

At the same time, they call DBT science, and maintain the supremacy of science. Calling something science without strictly adhering to scientific procedures, unfortunately is not science, and this is the case with DBT in audio far more often than not. It is more akin to the claims that intelligent design is science than it is science at this point. An additional point made in this forum has been the large number of DBT’s done that have failed to demonstrate that differences can be heard. A large number of scientifically compromised procedures yields no generalizable conclusions.

For anyone who has worked at a major university research mill, as I have, the skepticism about research results is strong. It is not that there is an anti-research or anti-science attitude. Rather, it is a recognition that the proliferation of research is more driven by the necessity of publishing to receive tenure and/or the potential for funding, increasingly from commercial interests that have compromised the whole process. We will have to see what happens to scientific DBT in audio when and if it happens.

I conclude that we are speaking fundamentally different languages when advocates of subjective audio evaluation and DBT advocates speak. For my part, subjective evaluation is fine as long as I understand that I better think twice before I believe a reviewer. I also truly believe in the supremacy of science, and intelligent design is not science.
This is because of the nature of auditory perception and its dependence on memory accrued over time (days or weeks, and not hours).

This is just 180 degrees opposite of the truth. Auditory memory for subtle differences dissipates in a matter of seconds. I defy you to cite a single shred of scientific evidence to the contrary.
In any AB comparison, the two compared signal path elements (components, cables, tubes, etc.) must have at least a week's trial of listening experience before being switched to the other, preferably obver 4 or more such switches.

More immediate AB comparison is not sufficient to reveal subtle but significant differences.

This is because of the nature of auditory perception and its dependence on memory accrued over time (days or weeks, and not hours).

Changes attributed to burn-in are nearly always the result of becoming familar (memory accrual) of a component over days or weeks, for instance - your perception is what changes and not the component.

Imediate AB testing can be downright invalid, and is only useful for detecting large and obvious sonic differences.
I know of no one who requires anyone to take a mini test to establish thier credibility before they perform the real test. That would be tantamount to my being required to be successful in a little mini-trial before I do the real thing.
This would be equivalent to asking your Doctor to examine two patients whose illness was known in advance before he could examine you. It would not be practical and it would not prove anythig.
Furthermore one person passing a test does not prove antyhing. The sample group would have to be of sufficient size. One person's success or failure could be easily discounted statistically. Thus if say Harry Pearson took the test and scored a perfect score his results could easily be invalidated. The majority of the other reviewers could flunk or have a statisticaly insignificant number of successes which would mean his success is a statistical abberation. That sort of heads I win, tails you lose logic does not work.
This time I mean it. If you listen the results should be obvious, if not don't buy. I have nothing else to say.
Henry: Yes, of course, listening tests are only relevant for the specific gear you're listening to. But a review is about specific equipment. Think of it this way: A reviewer has a reference system. He gets a new amp for review. Can he tell whether his original amp or the review amp is in his system, without looking? If not, is there any value at all to what he says about the sound of the review amp?
Testing the listening abilities of a reviewer...so what are u saying pabelson: that at the beg of each review, a summary of a DBT on the part of the reviewer of the particular equip being tested or listening abilities in general? The latter doesn't seem to work as DBT by nature is specific not broad. So if the former, you would then rule out reading the rest of review, perhaps apart from factual description? Really? I guess you would. Personally perhaps the additional info may be interesting, it does not add that much to me. The reason being again the DBT is for a specific time & enviroment: namely the rest of the control variables...the rest of the system which the reviewer may not be choosing? Or should he? The rest of the system, if for example being constructed with equip unfamilliar to the reviewer, what is he listening to? You may say he only needs to demonstrate that one piece of equip was chged, but in such an unfamiliar setting, our ears could gravitate and focus on something completely else.
Pabelson, I doubt if any reviewer could "pass" the DBT. This is because of the methodology. Any substitution of other than same/different methods would likely result in rejection of the results by DBT proponents as Gregadd says. Subjectivists would no doubt ignore reviewers "failing" the test. Nothing would be proven to anyone's satisfaction by the entire effort, so what is the point?

Somehow you seem to believe that reviewers are the arbitrators of quality leading customers around like sheep. As is often noted the "best component" issues outsell other issues. I do not know whether this "proves" the influence of reviewers or magazines. Some may just be keeping a count on where their equipment falls.

Pableson- No they should not be kept in the dark about the reviewers ability.
You assume that ABX/DBT is the standard by which everyone must be measured. I categorically reject that premise! Therefore I cannot answer any questions that require me to accept that premise.

As a long time reader of audio component reviews I am aware of thier shortcomings. However the overwhelming majority of reviewers admit that they are fallble and that they have listening biases. Thier review is thier personal opinion. Thier goal is to identify components which they believe bring us closest to the reproduction of music.

Let's take the absolute sound for example when Harry Pearson still owned it. They periodically published their background, room dimensions, personal listening biases and associated equipment.
Once they have identified an audio component as having merit it is then the readers job to make his own evalaution. This is true for any critic. If the critic is consistently wrong ultimately the readers will go somewhere else for opinions.

They way I evalaute a reviewers ability is to listen to components they have recommended to see if thier opinion is valid. If I can't duplicate thier experience on a consistent baisis then I have to doubt thier ability.
So readers should be kept in the dark about the listening abilities of the reviewer? Whose interest does that serve?
No I don't think they should dbt because even when the reviewers pass the test so called objectivist don't accept it.
Gregadd: There are lots of different DBTs. Some measure preferences, some measure *how* different two things are, etc. I'd say a reviewer should be allowed to use whichever method he likes (or invent his own, as long as it's level-matched and blind). But if he can't tell the difference between his own amp and the one he's reviewing under those very generous conditions, I think his readers ought to know that. Don't you?

And what's your problem with saying that equipment sounds good or bad? This is an audio discussion site. Eighty percent of the conversations here are about that. As for scientific tests of good and bad sound, that's what Sean Olive at Harman does for a living. Try Googling him.
I think that DBT is a way of validating (or invalidating) "decidedly subjective" opinions about sonic quality. If you can't reliably tell which is which, your opinion about which is better must be taken with a big grain of salt.
.. and by the way I am deeply troubled by your of the terms sounding good or bad. Those are decidely subjective terms.
Just what scientific test did you use to come to that conclusion?

Pableson-The DBT/ABX does not measure what sounds good or bad it only measures whther the listerner can identify a or b when comapred to x.

Its' not that we think ABX is not good enough. we think it's irrelevant.

It's been my experience that dealers are more of a slave to the audio press than audiophiles have ever been.

Finally reviewers are not scientist they are critics who give opinions as a guide. If audiophiles are taking thier opinions as gospel, they should trust thier own ears.
Therefore, participants should be able to demonstrate their critical listening skills.

Once again, the scientists are ahead of you. Standards for appropriate listener training exist. And they weren't devised based on the misapplication of principles from visual perception, let alone high-end cant; they were developed through experience that identified the background necessary to produce reliable results, both positive and negative.

If anyone doesn't feel those standards are sufficiently high, there has always been an alternative: Propose higher standards, and then find some audible difference that can't be heard without the benefit of your more rigorous training. For all the griping about DBTs, I don't see anybody anywhere doing that.

Finally, recalling the original subject of this thread, has any audio reviewer ever demonstrated that he posesses "critical listening skills" in a scientifically rigorous way? Nope. In fact, there's at least a little data suggesting that audio reviewers are *less* effective listeners than, say, audio dealers. This isn't too surprising. A dealer who carries equipment that sounds bad will go out of business. If a reviewer recommends something that sounds bad, he just moves on to the next review.
Hi Phredd2,

You asked for some additional elements for rigorous methodology. In addition to the acuity tests I mentioned previously, participants should pass reasonable memory tests. Otherwise, their inability to distinguish 2 amps may not be a statement about the amps but about the participants. It is fine with me if an audiophile wants to listen privately just to see if he/she likes or prefers a component. But this is not acceptable for rigorous testing. Therefore, participants should be able to demonstrate their critical listening skills. If they aren't accustomed to listening consciously for nuances in harmonic textures, changes in micro-dynamics, phrasings, ambience, decay, etc., then they may miss subtle differences in how 2 amps reproduce the different musical elements.

"After-effects", as pointed out in my previous posts, are inherent to our perceptual mechanisms and brain circuitry/chemistry and may smear differences between 2 components in a short-term DBT. Consequently, a negative result of a short-term DBT may have an interpretation other than "no difference in the amps". Allowing enough time for the "after-effects" to subside, is one way to reduce their effects. However, this may add to some degradation of memory, as pointed out in one of the posts above; but that just re-inforces my contention that the underlying complexity has not been unravelled enough yet to make definite determinations. Please see my exchanges with Qualia8 for additional comments.

Great Listening,
John
TBG: So who called you a fool? Who called you "anti-science"? Citations, please.
Qualia, yes there is some minor DBTesting in wine, but as in audio no one pays any attention to it. Like audio tastes rather than DBT rule in the buying decision. Please understand that I see nothing wrong with your making decision based on this methodology, but I do recent those of your school calling other "anti-science" or fools.
Qualia8,

There is a vast array of specializations among the neurons in the brain. Some, as you pointed out, detect differences, others sameness; yet others, change or motion or timing, etc. Ignoring that complexity, may lead both sides of this discussion to over-simplification at best and to closed-mindedness at worst.

With that in mind, let me add the flip side to my previous post to you. The after-effects may not only smear differences, but they may also distort sameness. Take for example the two abstract amorphous paintings containing a rich array of colors in my living room. Everyone who looks at either one, reports the same phenomena. The colors change, the amorphous shapes change and those shapes move. Now, we know the painting remains the same. The changes are the result of the brain's processing. It appears the after-images of the various colors "combine" with the direct stimuli to produce a change in the perception, which in turn forms it's after-images which "combine" with the subsequent direct stimuli, etc. What follows is a sequence of illusory changes which create a dynamic that is not there.

This perceptual phenomena of after-images has been studied but it has not been eliminated. The temptation to reduce it's effect by taking micro-second intervals of music, automatically prejudices the methodology against percieving differences that require longer intervals; for example, decay and rhythm.

The debate with probably go on. In the meantime, it's good to have a discussion that produces more illumination than heat.

Enjoy the Music,
John
But if these "after effects" mattered, John, then we'd see listening test results showing that putting gaps between samples improved subjects' sensitivity to differences. I don't know of any such test results. Do you?
Qualia8,

"Far from cleansing the auditory taste of one note from one's mind and then playing another, you need to play them immediately back to back for comparison purposes. Perhaps you can switch the order around to eliminate after-effects"

Switching the order around doesn't eliminate after-effects, it only replaces one "smeared" event by another; possibly different from the first. For example, if you follow a yellow image by a red one, the complementary after-image of yellow (violet) will "combine" with red. If you switch the order and show red first, followed by yellow, then the complementary after-image of red (green) will "combine" with yellow.

Best Regards,
John
Citations please, Pabelson. I don't follow this literature any longer but your mere saying we know is not convincing.
Hi Pabelson,

You did not address some critical issues I raised in my posts. In particular, that the "after effects" of sensory experience can combine with subsequent stimuli to smear differences. The "after effects" result when the brain circuits don't start and stop with the the stimuli. If you continue to evade by brushing aside the issues, then there is no reason for me to continue with this thread.

Best Regards,
John
An explanation of why we pick up auditory differences closely spaced in time but not those spaced out over time:

The auditory system works like most of our perceptual systems, by detecting differences and similarities, rather than absolute values. What we detect, for the most part, are differences from a norm or differences within a scene itself (synchronically). The norm gets set contextually, by relevant background cues. This is more evolutionarily advantageous than detecting absolute qualities, because the range of difference we can represent is much smaller than the range of possible absolute value differences. By setting a base rate relevant to the situation and representing only sameness and difference from the base rate, one can represent differences across the whole spectrum of absolute values, without using the informational space to encode for each value separately.

For instance, we can detect light in incredibly small amounts -- only a few photons -- and also at the level of millions of photons striking the retina, but we can't come close to representing that kind of variation in absolute terms. We don't have enough hardware. What does our visual system do? Well, the retina fires at a base rate, which adjusts to the prevailing lighting condition. Below that is seen as darker, above that is seen as lighter. A great heuristic.

As it gets completely dark, you don't see black, but what is called "brain grey", because there is no absolute variation from the background norm. You see almost the same color in full lighting when covering both eyes with ping pong balls, to diffuse the light into a uniform field. With no differences detected, the field goes to brain grey.

Ask yourself why the television screen looks grey when it's not on, but black when you're watching a wide-screen movie. Black is a contrast color and true black only exists in the presence of contrast. Same for brown and olive and rust.

Same for happiness, actually. The psych/econ literature on happiness shows that most traumatic or sought after events are mere blips on the happiness meter, as we simply shift base rates in response, adjusting to the new conditions. Happiness is primarily a measure of immediate changes, bumps above base rate. So minor things, like good weather and people saying a friendly hello, are more tightly correlated with happiness than major conditions like having the job or the car you've been wanting.

Think about pitch. We can tell whether pitch is moving, but only the lucky few have any sense of absolute pitch... and this is usually a skill developed with a lot of feedback and practice. Why? Because it's more useful and economical to encode that information.

Far from cleansing the auditory taste of one note from one's mind and then playing another, you need to play them immediately back to back for comparison purposes. Perhaps you can switch the order around to eliminate after-effects.

By the way... wine-lovers *do* take blind taste tests. And experts can readily identify ingredients in wine, as well as many other objectively verifiable qualities. So it is perhaps not the best analogy for audiophiles who cannot do the same, and won't deign to try.
Puremusic, that's a good start on coming up with a test you would find satisfying. What would the "other things" you mention be? Would any of the other "subjectivists" in the crowd care to propose changes to the acceptable methodology? What would you find convincing?
Puremusic: Psychoacoustics and neuroscience are already way ahead of you. In fact, the kinds of things that get argued about in audio circles aren't even being researched anymore, because those questions were settled long ago.

Just to take one example, you insist on "sufficient time between samples." The opposite is true, in the case of hearing. Our ability to pick out subtle differences in sound deteriorates rapidly with time--even a couple of seconds of delay can make it impossible for you to identify a difference that would be readily apparent if you could switch instantly between the two sources. (Think about it for a second--how long would a species survive in the wild if it couldn't immediate notice changes in its sonic environment?)
For the record, I am not opposed to rigorous DB tests; they can provide useful information. However, I do NOT have a high level of confidence in definitive interpretations of a negative result of a short-term DBT involving 2 components that may have subtle differences. As noted in my previous posts, the underlying complexity has not been unravelled yet.

I'll try one last time to hint at the complexity involved. In wine tasting, if you taste two samples one after the other, you should rinse the mouth with water to minimize the influence of the "after taste" of the first sample on the second one. If you look at a bright yellow object and then close your eyes, you will see an "after image" of a complementary color. As long as that "after image" persists, it is a "noise" that may influence some subtle subsequent visual experiences. Our brain circuitry and chemistry is not like electronic circuitry. I does not start and stop with the stimulus; and it has it's own variable "noise floor". The "after effect" that persists may mix with the subsequent stimuli. This added "noise" may smear the more subtle characteristics. A SHORT-TERM DBT may not allow enough time for the "after effect" of the previous sample to subside. That "noise" in the neuro-biological environment may smear SUBTLE differences.

Those of you with high level of confidence or faith in the negative results of short-term DBTs have yet to address this and other complexity issues. Hopefully, these issues will be sufficiently addressed as neuroscience and psychoacoustics develop. The reason why tremendous amount of research is still going on is because there is a lot that is not yet known. At least not enough is known for me to be very confident.

In the meantime, a rigorous DBT, among other things, should: 1)provide sufficient time between samples; 2) reduce the room effect that may smear differences; 3) make sure the participants pass a comprehensive hearing test, demonstrating that they can hear the frequencies in the audible range and can percieve dynamic gradations; 4) make sure the tested material includes a full spectrum of frequencies and a large variety of harmonic textures and dynamic shadings; 5) adjust the level of sound, preferably without adding any other components into the signal path that may smear differences; etc. After all, a meta-statistical analysis on a lot of flawed DBTs is not good science.
I use to think wires made no difference. Shoot, if I go back far enough in time, I didn't believe their were differences in amps. Consumer report's DBT articles agreed with me. I absolutely heard a difference in front ends, and speakers, so that's where all my money went.

A while back, an audio buddy brought over his new Shunyata power cords. We did a DBT, as best we could, and heard no difference between the expensive cord, and cheap stock. I had a so-so front end, and solid state amp running ribbon speakers.

The shocker came, when we all went to my buddy with cords place, and easily heard the Shunyata PC's superiority. He has clean sounding TacT gear.

When I switched out my ss amp for a "digital" one, we ran another DBT test. The PCs made a huge difference now. The same went for all wires. Everything left it's imprint on the end sound.

We noticed similar results in OTL systems.

My conclusion is, with "golden" systems I've heard, engineered to squeeze out the last distortion free musical morsel, one can discern small differences in all component rolling.

Maybe there is so much noise in lesser systems, despite what THD measurements say, small wire and amp differences are smeared over, and can't be heard.

In my case, the more I peeled off signal junk, the more I learned what devices produce said junk.
Okay, objectivists, one more try. I have participated in same/different DBTs and found that I could not hear differences. I have also participated in double blind tests that merely selected which preamp sounded best. In this case differences were obvious and most agreed on which preamp we preferred. I valued neither testing but the latter was more fun.

I am engaged in a social science and teach research methods at the graduate and undergraduate level so I am not anti-science. But there is good science and bad. More importantly there is the question of whether the concepts in the hypothesis are tested by the variables in the data. I am merely stating that I am unconvinced that questions such as whether there are differences among amps in their sound are not validly assessed by the short-term same/difference methodology commonly associated with DBTs.

A methodology that fails to hear differences among amps, wire, etc. heard by so many even in double blind circumstances is not convincing. It may sooth those who cannot afford more expensive equipment who can dismiss those who buy more expensive equipment as just impressed with face plates or bells and whistles or sold by hype, but it does not prove their delusional behavior.

I don't mind people keying their behavior on the most common "no difference" findings of DBTs, but objectivists feelings of superiority based on bad science are unjustified and likely to convince very few.

I really have failed in my first posting to suggest why DBTesting has failed to catch hold and why so many of us could care less that it has. No amount of casting aspirsions about subjectivists being unscientific will convince us and obviously no patience in presently my perspective will convince you. So why don't we just drop the issue and get back to enjoying life?
I don't understand all the talk about "flawed methodology". If the methodology is flawed, make a suggestion as to how to improve it. In other words, for those who dispute the validity of DBT, please suggest a test that you would find convincing and yet would still control for the same factors (primarily listener bias) that DBT is designed to control for. Would you be convinced if a reviewer did a one month test of disputed component A in his or her own home, followed by one month with disputed component B? What about comparing a one month test of equipment with the reported price ranges and labeling reversed? What would it take to convince you?

Don't do what my friend did. He agreed to participate in a double-blind test that we discussed with him in advance. Only when when the test didn't show what he expected to find did he question the methodology. So agree on the methodology first, then live with the results.

I mean this seriously. A DBT won't change anyone's mind if the testers are not convinced in advance that the test will measure something. So please help to design an objective test that you AGREE IN ADVANCE will work.

If you believe that there is no such test, then you should question your own assumptions about the validity of the scientific method in general.