In defense of ABX testing


We Audiophiles need to get ourselves out of the stoneage, reject mythology, and say goodbye to superstition. Especially the reviewers, who do us a disservice by endlessly writing articles claiming the latest tweak or gadget revolutionized the sound of their system. Likewise, any reviewer who claims that ABX testing is not applicable to high end audio needs to find a new career path. Like anything, there is a right way and many wrong ways. Hail Science!

Here's an interesting thread on the hydrogenaudio website:

http://www.hydrogenaud.io/forums/index.php?showtopic=108062

This caught my eye in particular:

"The problem with sighted evaluations is very visible in consumer high end audio, where all sorts of very poorly trained listeners claim that they have heard differences that, in technical terms are impossibly small or non existent.

The corresponding problem is that blind tests deal with this problem of false positives very effectively, but can easily produce false negatives."
psag

Showing 15 responses by zd542

Finally, I've been waiting for someone to show us ho to do this the right way.

"Likewise, any reviewer who claims that ABX testing is not applicable to high end audio needs to find a new career path. Like anything, there is a right way and many wrong ways. Hail Science!"

Remember, those are your words. You're stating in no uncertain terms that you, not someone else, but you Psag, know how to do this the right way. Now show us. Up until this point, the only thing people with your view do is talk. That's not science. If you're right, and you really know what you are talking about, pick 2 products, conduct the test, and report your findings in a way so the rest of us can try it for ourselves. That's how a real scientist would do it. No more talk and excuses. Put your money where your mouth is and finally do one of these tests.
""The problem with sighted evaluations is very visible in consumer high end audio, where all sorts of very poorly trained listeners claim that they have heard differences that, in technical terms are impossibly small or non existent.

The corresponding problem is that blind tests deal with this problem of false positives very effectively, but can easily produce false negatives.""

I have the above quote as being from the thread you mentioned.

"Likewise, any reviewer who claims that ABX testing is not applicable to high end audio needs to find a new career path. Like anything, there is a right way and many wrong ways. Hail Science!"

That one appears to be yours. If not, tell me where it is because I didn't see it.
"01-15-15: Geoffkait
There is no such thing as a test that can be be generalized. Someone else may get entirely different results. Then which test is correct? And who decides?"

The whole issue is that there are no test's. There never have been and there probably never will be. The title of this thread is: "In defense of ABX testing". What testing? For years I've been asking these people to show me some of the tests they've done to back up what what they say. They can't. The best they can ever do is bring up concepts from psychology like expectation bias and just hang on to that like its the answer. lol. I have a degree in psychology. I know full well what these term mean and how they are applied. And if you think that you're dealing with a case of expectation bias, you still have to test for it. Otherwise you're just guessing.

The reason these guy's won't do any tests is because they know there's a really good chance they'll be wrong and they don't want to look bad. What's the first thing the OP says when I challenge him and his science?

"We Audiophiles need to get ourselves out of the stoneage, reject mythology, and say goodbye to superstition. Especially the reviewers, who do us a disservice by endlessly writing articles claiming the latest tweak or gadget revolutionized the sound of their system. Likewise, any reviewer who claims that ABX testing is not applicable to high end audio needs to find a new career path. Like anything, there is a right way and many wrong ways. Hail Science!"

"Actually they are not my words."

Well, he's probably right since you can't actually own words.
"01-17-15: Geoffkait
Judging from the AES paper by Olive the listening tests are, in fact, excessively complicated, a criticism he dismisses. Furthermore the listening tests apparently involved only frequency response. What happened to other audiophile parameters such as musicality, transparency, soundstaging ability, dynamics, sweetness, warmth, micro dynamics, pace, rhythm, coherence, to name a few? One supposes testing for those parameters would make the tests way too complicated. Maybe Olive thinks those parameters are too subjective, who knows?"

You couldn't have said it any better. If you read through the Hydrogen posts, those guys get mad because the reviewer listens to a component and puts what he hears into a review. What else would you have them do? I mean the intended purpose of a piece of audio equipment is to use it to listen to music. The nerve!

Bob_reynolds,

You've stated in the past, in no uncertain terms, that you can look at the specs of a component and tell how it sounds, without listening to it. Do you really expect anyone to believe that you can list all the qualities that Geoffkait states in his post without listening to whatever the component is? Its hard enough to do that when you have the piece in your own listening room.
"Nobody is denying the existence of placebo effect or expectation bias or it's ugly sibling the reverse expectation bias or any other such psychological effects. But to declare that there are no proper tests is a little bit inaccurate."

No, its not. If you take my quote that you reference, "For years I've been asking them to show me the tests.", and put it in context with the rest of my statement:

"If you're right, and you really know what you are talking about, pick 2 products, conduct the test, and report your findings in a way so the rest of us can try it for ourselves. That's how a real scientist would do it."

we get a different picture. To me, if you wanted to conduct a scientificly valid listening test to compare 2 audio products, I can't see you going wrong doing it the above way. Not only that, if you read through the thread that the OP referenced, you'll see that, at least some of them, do agree with me. So again, it all boils down to the same exact thing. Show me the test's. If me declaring that there are no proper tests done is a little bit inaccurate, by all means, show me. I'll settle for just 1. And just to be clear, I really have been asking for years. I'm 100% serious. If you really want to, you can check my old AG threads.

Sorry, but before I forget; "I suspect maybe you've been asking the wrong people.". Maybe you're right about that. Who do I ask? I mean if I don't get results at a place like Hydrogen, where its their mission statement to go by science, and will actually censor threads if they don't contain content approved by the moderators, then who do I ask? If you really take a step back and look at this whole issue, the people who talk about doing these listening tests the most, avoid it like the plague. It doesn't even make sense.

"01-17-15: Bob_reynolds
Drs. Floyd Toole and Sean Olive have been doing blind listening tests of loudspeakers for over a decade."

I've seen all these before. I'm assuming everyone else here has too because they are fairly popular. They don't address the issue here, and that's comparing specific pieces of components. Here's a piece of the OP.

"Especially the reviewers, who do us a disservice by endlessly writing articles claiming the latest tweak or gadget revolutionized the sound of their system."

Can you show us some scientificly valid listening tests that were done comparing individual components as part of a review?
"Probably I shouldn't have used the word 'science', which seems to get people in an uproar. Perhaps a better word would have been 'logic'. It is logical to assume that by using standard ABX testing, one can determine with certainty which of two testing scenarios sounds better. And in fact, that assumption turns out to be true."

No. Its not logical to assume that, and the assumption is not true. Science and logic are not the same thing. Science proves the earth is round while logic says its flat. Before science proved this, it was logical to assume you could fall off the edge of the earth if you went far enough in one direction.
"01-20-15: Jea48
Back in the medieval days was it science or logic for the times when doctors used to bleed a patient saying the patient had too much blood?"

Neither. It was stupidity. They has no way of knowing how much blood was too much blood. The only thing they knew for sure was that if you lost enough blood, you died.

"01-20-15: Onhwy61
Zd542, I think I get your point, but the earth is flat is not good example. Ancient Egyptians and Greeks figured out that the earth was round via observation and logic."

Yes, but not every one knew that. And without knowing the earth is round, it is logical to assume that its flat. From they're perspective, that's how the world appeared. Also, and more important, its true that Ancient Egyptians and Greeks figured out the earth was round. But it only became logical assumption until after study/observation was done. They got direct results that proved otherwise.

"I thought the point of A/B testing was to determine if there was a difference, not a preference?"

Absolutely. If I said otherwise, point it out because its a mistake. Even though my view is that listening is the most important part of evaluating an audio component, I don't see why A/B testing, to root out differences, if any, is not worthwhile. Especially when the differences are small. For example, a test would come in handy if a reviewer has a difficult time hearing a difference between 2 products. We've all been there. Sometimes, its hard to tell. Having some type of conclusive data concerning these areas can help. But still, if a reviewer included this type of data in a review, it would be important to list exactly how the test was done and with what type of equipment. The reason, of course, is that not everyone has the same equipment, hearing ability and listening skills. And while not perfect, test results can be used as an aid, just like measurements. Something to help with a selection but not to be taken absolutely.
"LOL, but doctors in that time period of history didn't know any better."

That was my point. The only thing they knew for sure about blood was that if you loose too much of it, you die. Armed with only that info, I stand by my statement. lol. Maybe there was some logic to it, but I don't see. I guess its possible that every time a doctor saw a bleeding, injured person, he thought that was the body getting rid of some excess blood. Why blame it on the arrow stuck in the arm.

About the rest of your post, my comments on all this, in context, is in reference to the thread the OP mentioned on Hydrogen Audio. The complaint was that reviewers were listening to audio components and then basing their review on what they heard, and not doing any type of scientific listening tests. They've complaining about this very thing for years. My comment was that if these types of test are so damn important, then just do them already. Even if its just a few tests just to show us all how to do it. Instead, its just year after year complaining, and they never do anything. That said, if they can come up with some kind of useful tool to help better evaluate audio equipment, I know I would be interested in seeing it. Why not? My personal opinion is that they don't have the guts to do anything they talk about. There's always the chance they would be wrong. They have too much invested in the argument.

"In ABX testing, the definition of the cumulative response is trivial: can reliable hear a difference or not. But because the stimulus is likely to be imprecisely defined, absence of reliably making the distinction does not mean there is no difference."

I agree with that. My view is that you would have to tell the test subjects what they're listening for. There's really no way around it, they have to know. To offset results that are not accurate, you can increase the number of tests, or try's, each subject takes. So, for example, if you were trying to test to see if a difference can be heard between a silver cable and a copper cable, all other things being equal, maybe have them listen to 50 or 100 samples. Maybe they can get lucky and guess correctly for 5 or 10, but 100 is highly unlikely. Not only that, under the same scenario, you can tell them exactly what they should be listening for. If there is really no difference to be heard, over time/individual tries, the test subjects will have to trend towards a 50/50 split. It won't matter what they think, or know they can hear.
"50 to 100 samples you say? Why not make it 100 to 200? Over how many years do you expect your ABX listening test experiment to take?

Just curious have you ever A/B compared 2 or 3 cables to one another? More than 3 or 4 at a time? Could you hear audible differences between the cables?"

Yes, I have. I did an experiment a few years ago and compared AQ Cheetah IC's to a pair of AQ Panther IC's. Both cables are identical except for the conductors themselves. One silver, one copper. The goal was to see of a difference could be heard between the 2 metals, and nothing else. It wasn't about what one sounded better, just if there was a difference. There was 4 of us took the test and we listened to 100 samples of a 10 second audio clip that took around 30-40 minutes for each of us.

"50 to 100 samples.... Do you believe there are people in the world that can tell which key of a piano is struck on a tuned grand piano in a blind test? Do you think their brain learned the sound of each key in the span of a week or so, or even a few months or so? How about in a year? "

Actually yes, and I can prove it. My brother has something called perfect pitch. He can tell with 100% accuracy what any note or cord played on any instrument is, and if its in tune or not. I don't have it myself, but if you have ever played an instrument, you can develop something called relative pitch. Its not as good as perfect pitch, but its a skill that can be learned. For me, I needed to develop the skill somewhat when I played drums in school. If you have ever seen kettle drums or tympani, they have to be tuned to a certain note when you play them. That is what the food pedal is for, it sets tension on the drum head. Anyway, you have to be able to set the drums to different notes while the band is playing. To do this, you tap it very lightly (because the band is playing), and hopefully tune it to the right note before you need to play it. Its not an easy thing to do, but its a skill that can be learned.
"And the findings, results, of the listening test?

Just curious what is behind your thinking of needing so many samples for your listening test?"

The real issue here is that you don't care about any tests that were done. You've got your emotions tied up in all this and just want to win the argument, and be right.

You ask me why I needed so many samples for my tests. Not only did I already give the the answer in a prior thread, you quoted it in your last post! Here it is. Maybe you'll remember it this time.

"So, for example, if you were trying to test to see if a difference can be heard between a silver cable and a copper cable, all other things being equal, maybe have them listen to 50 or 100 samples. Maybe they can get lucky and guess correctly for 5 or 10, but 100 is highly unlikely. "

I thought I was pretty clear, but I'll try to explain it again. With 5 or 10 samples in a simple yes or no test, I thought it wouldn't be out of the question to get an inaccurate score due to error's or guessing. A small sampling really leaves no margin for error. I mean if I flip a coin 10 times, what are the chances of you getting 5 heads and 5 tails and average 50% like the test should? There's a very high probability you won't average 50% with such a small sample. Flip the coin 100 times and you will get much closer to the statistically accurate 50% that you should be getting. Now, just 1 last time to be extremely clear, If I flipped a coin 10x, you have a much greater chance of guessing something other than 50% than if I was to flip it 100x. That's why I needed so many samples. Didn't you have to take statistics in college?

And as to the results of the test's, its not relative to this discussion. You only want me to list the results so you can comb through them to find the slightest detail just so you can claim the whole thing is null and void, so you get to be right. I won't play that game. You're just going to have to continue playing with yourself like you've been doing.
After my last post something tells me I won't get anywhere, but here go.

"if test are not done scientifically and are not based on "opinions" they really aren't real to me. How does one measure whether the equipment accurately demonstrated the sound stage depth? dimensionality? etc. I hear many opinions of the reviewers, but based on what? What criteria? are you going by memory in your opinions and comparisons? or did you listen intently and then switch out that amp with another (without changing anything else) and listen again?

I have read of some reviews that do exactly that. And the equipment they are reviewing is compared to similar equipment within the price point. That is alright for me. But, I still prefer an A/B comparison test that is blind to really identify the sonic differences in an unbiased way.

In the first paragraph, you're talking about subjective qualities that the reviewers are discussing. We all know that those qualities mentioned can't be measured, so what would you have the reviewer do? We're supposed to be adults here. When I read a review its not too difficult to pick out the things that are purely subjective in nature. Yes, they are listening to the component and writing their subjective opinion as to what they heard. Here's the 1 detail that many people miss. Most of the people that read the reviews, the magazines customers, know this is how they do it, and its not a perfect process, but they still want the review anyway. Any why not? Why do you think they bought the magazine to begin with?

This caught my eye in particular.

"But, I still prefer an A/B comparison test that is blind to really identify the sonic differences in an unbiased way.
"

You say that you prefer this type of blind testing like there are some reviewers that are doing it. I've never seen any reviewers do this. Where are you finding them? I'm more than willing to give them a chance. If they can show me some testing that helps make a better decision, I'm all for it.
"Your last post leaves me with the impression you do not think there are differences in cables and therefore they cannot be heard, especially when you get defensive when I asked the results of your controlled listening tests. Beats me why you wouldn't want to disclose the results."

You're allowed to have any impression you like. It has nothing to do with me. It's your choice, not mine. As for the reason why I don't want to disclose the results, once again, I already gave it. It was clearly stated in my last post. Here it is again.

"And as to the results of the test's, its not relative to this discussion. You only want me to list the results so you can comb through them to find the slightest detail just so you can claim the whole thing is null and void, so you get to be right."
"01-29-15: Psag
I agree that only a limited number of switches are needed, if the test conditions are good. What are good test conditions?: A treated room with good acoustics, high quality electronics, well-recorded music, the ability to do rapid switching (having a second person to manipulate the hardware helps), and familiarity with the musical selections. That's all you need to eliminate subjectivity and get to the truth."

What I was referring to was a very simple test. You have 2 cables in the system, 1 is copper, the other silver. The goal was to see if you could pick out the silver or copper, and that's it. Nothing subjective like what cable sounds better. That's just personal preference. So after you hear a 10 second clip of music, you say copper or silver. With such a small sample, you can't really weed out things that may produce bad results. For example, lets say that there really was no difference that a test subject could hear between the 2 cables. That would mean, both cables would sound identical. But we won't know that until after the test. That would also mean, every answer given would only be right by pure chance. So for this test, since there are only 2 answers, and going by the assumption that there is no difference, over time, the answers would have to conform to a 50/50 split. If we only got 10 samples under this scenario, there's a really good chance you wouldn't get a 50/50 split with just 10 tries. With 100 tries, you get much closer. An easy way to visualize, or even try this concept to see for your self, would be to filp a coin. Flip it 10x, and even though you should get 5 heads and 5 tails, with so few tries, you can easily get different results. The only way to reduce this type of error is to take a larger sample. Flip a coin 100 times, and you'll get much closer to the 50/50 split that you would expect to get from just pure chance.
Psag,

Sorry, but I don't buy your last post. It makes absolutely no sense. You mean to tell me that the concept of probability, a simple coin toss, can't be addressed buy you because you feel its pseudoscience? You don't have to be an actuary to understand the concept, they teach it in grade school. And then to go on with all the subjective issues when I clearly stated that the whole purpose of the test was to was to not go there.

Given the above, the only thing that makes your response understandable, is if there is some type of deceit involved. I guess that maybe you don't like me from another thread, or you have a technical background that your ego needs to support, or something. Whatever it is, I don't believe you don't understand something so simple. I believe that you won't understand something so simple.

Sorry, but I just don't see it any other way.