"You can have a system that lets you hear every detail on a recording and also engages you emotionally, or you can have a system that gets all the detail but none of the emotion."
This approach is entirely incorrect. Listening to an audio recording with no visuals, all record of emotion in the performance is transmitted to us by the recording and our play-back system. We can perceive emotion in the performance only by what we hear. Therefore if we wish to obtain an authentic impression of the emotion it is essential our system is 'discerning' - OP means 'accurate', discernment is not within the province of inanimate equipment.
It is also essential the recording accurately reproduces the emotion in the original performance. In many cases this does not occur and failings in the recording cannot be put right by even the most accurate system. As @prndlus points out, where this occurs we have no way of knowing what was the emotion in the performance. If you like, the original emotion is fixed but there are two variables and both have to be fixed before we can say anything about the original emotion.