I am a cognitive psychologist and have spent most of my professional life measuring consumers'/users' perceptions of products. I use advanced psychometric techniques to measure multiple components of perception and correlate each one with various physical measures of products provided by designers and/or engineers. Such an analysis provides design requirements for perceptual experience. It's stimulus/response psychology at a fairly sophisticated level.
I am also an audio enthusiast and find two aspects of perceptual experience pertinent to issues of sound quality. One is that perceptual experience is not a single thing, but a composite of four underlying factors: a) Valence (negative vs positive), b) Potency (delicate vs strong), c) Arousal (relaxing vs stimulating), and d) Novelty (ordinary vs unique).
Another interesting thing is that each component of perception is bipolar. That is, perceptual experience varies between polar opposite extremes. Hence, perceptual neutrality is literally in the middle of the polar extremes, i.e., the zero cross-over point. Hence, perceptual neutrality is, indeed, bland.
It seems to me that a neutral audio system is not perceptually neutral, but one that passes an input signal through to output without changing it qualitatively. Whiule it is possible measure the perception of the sound coming out of an audio system using methods such as those alluded to above, how do you measure the perception of the input? You would have to have people listen to and rate the live performance as well. As statistical comparison of the two sets of data would then tell you if they were the same or not. I have used methods very much like this to compare products to one another, perceptually. One can get very precise data regarding perceived similarities and differences between systems this way.
The point I wish to make is that it is possible to precisely measure perceptions of multiple stimulus situations and compare them using statistics. The multivariate nature of perception means that the target for any given system will be a profile of perceptual measures that matches that of the stimulus situation one is attempting to reproduce. But, my personal preference might not be a veridical representation of some original stimulus situation. I might like the audio systems sound quality to depart from the original stimulus on one or more of the above perceptual factors. Indeed, only Valence has an obviously negative pole to e avoided. With the other three perceptual factors I am free to gravitate toward either pole even though it might not be an accurate reflection of the original source material. It's sort of like touching up a photograph to accentuate certain visual factors. I don't, personally, have a problem with that. You should just be clear what your goal.