Amir and Blind Testing


Let me start by saying I like watching Amir from ASR, so please let’s not get harsh or the thread will be deleted. Many times, Amir has noted that when we’re inserting a new component in our system, our brains go into (to paraphrase) “analytical mode” and we start hearing imaginary improvements. He has reiterated this many times, saying that when he switched to an expensive cable he heard improvements, but when he switched back to the cheap one, he also heard improvements because the brain switches from “music enjoyment mode” to “analytical mode.” Following this logic, which I agree with, wouldn’t blind testing, or any A/B testing be compromised because our brains are always in analytical mode and therefore feeding us inaccurate data? Seems to me you need to relax for a few hours at least and listen to a variety of music before your brain can accurately assess whether something is an actual improvement.  Perhaps A/B testing is a strawman argument, because the human brain is not a spectrum analyzer.  We are too affected by our biases to come up with any valid data.  Maybe. 

chayro
I think @atmasphere did a good job in his short post w.r.t. distortion https://forum.audiogon.com/posts/2377289

@henry53 

In a recent test Amir 'beheaded' a product (i.e. lowest rating) because it had an unacceptable 0.003% distortion. Can anyone can hear 0.003% distortion, are any speakers possible of even 0.03% distortion? What about 0.3% distortion? I have since listened and bought the product, it sounds wonderful, several others agree, the measurements have spoken,  but what do they mean?

I've seen him downgrade a product because it didn't measure as well as the competition or it was substantially more expensive with no gain in value. A downgrade doesn't necessarily mean it doesn't sound good.

The opposite is also true. I've seen him recommend products that don't necessarily measure well (although the flaws are still inaudible), but are a great value.

 

Most of you on this forum likely do not know or have ever heard of a gauge R&R.  Most also likely do not understand the concept of accuracy and precision.  That's not a slight.  This is a difficult concept and much work has been done to define it and apply it to test measuring equipment.  I want to start with something most of us know quite well- the bathroom scale.  If you are like me, we have a love/hate relationship with our bathroom scale.  It's a simple device that can either make or break our day and yet we typically do not think twice about whether or not it is telling us the truth.  What do I mean?  Well, for starters I can get on my bathroom scale three times consecutively and get three different readings with a range of 2 or more pounds.  Even worse, I find that I can move the scale around on the floor and get even more variation.  This is one of the newer scales with a digital readout to tenths of a pound.  While my bathroom scale indicates a precision of 0.1 lbs, the repeatability is much worse which implies the accuracy is likely off by a few pounds.  I don't know because my bathroom scale has no reference back to a standard.  I notice the scale at the Doctor's office has much better repeatability.  I see just 0- 0.1 lbs variation if I step off and back on again and the Doctor's scale has higher precision based only upon the display showing hundredths of a pound.  But I have rarely seen a calibration sticker on the scale in the Doctor's office.  I have seen stickers on the scales at a research dept and at the hospital.  Probably because they publish reports.  Accuracy is typically not well defined.  Typically, gages are rated accurate to within a certain percentage of full scale.  Let's say a bathroom scale is rated to +/-0.5% of full scale.  (Not likely that good for a $30 scale)  That means the manufacturer is stating that any reading will be (for a 400 lb scale) within +/-0.5% of 400 lbs or +/- 2 lbs.  So I could have lost one pound overnight but my bathroom scale might tell me that I gained one pound!  Isn't that frustrating.

What's my point?  Let's say you go to the butcher shop and you buy a 10 lb ham.  Then you stop by another shop and just to see, you weigh the ham on their scales and find it only weighs 9 lbs.  Wouldn't you be upset?   How about you stop at the gas station and buy 10 gallons of gasoline only to learn you actually got just 9 gallons.  Well, take comfort in knowing that by law those scales and gas pumps are calibrated back to a standard.  If you look at the scale at your butcher shop you should see a calibration sticker.  The same goes for your local gas pump.  Take a look on the face plate of the pump for the calibration sticker.  

If we count on these everyday items to telll us the truth then why not expect the same regarding measurements of stereo gear.  Knowing that calibration of the equipment to a standard was done, what test equipment was used, and also the procedure so that the measurements can be duplicated or verified by someone else is crucial to know that the data is telling us the truth.  Also important is to know how these particular measurement data relate to how the piece of gear performs.  For example, I can measure the resistance of two different speaker cables with an Ohmmeter or even a resistance bridge for more precision but still conclude no difference.  So why do they sound different?  Some speculate that better cables reject RF noise.  Sounds reasonable to me.  So why hasn't someone published test data showing the RF rejection characteristics of different cables?  Maybe they have but I just have not seen it.  This would not be easy testing.  It would require a Faraday cage and some sophisticated measurement equipment.  Still, we cannot and should not take every measurement at face value and make conclusions from that about what we are or are not hearing.  I had my own saying in Engineering:  "No-one believes the test data except for the person who took it.  Everyone believes the calculations except for the person who made them.

So in my vanity, I will take several readings on my bathroom scale but accept only the lowest reading.  I don’t do a statistical calculation of the group of readings.  That’s the very definition of biased testing, I think.  And what’s it matter?   When I go to the Doctor’s office they will not accept my weight based on my scale’s readout.  They take their own measurement on their scale.  No one believes the test data except for the one who took it…