Seems like a complicated way to reproduce music
Yes, it is. It’s taking sound vibrations, storing it on a physical media, then retrieving the signal, modifying and amplifying the signal as needed to drive transducers in speakers. It’s an imperfect process, a facsimile of reality, but that’s all we got for now.
Then you have high end audio which emphasizes quality sonics which often means: lowering the noise floor, minimizing EMF and RFI interference, cleaning the AC power, minimizing distortion, shortening the signal paths, minimizing internal vibrations, controlling case vibrations, designing better circuits, using higher quality/specification parts, using cutting edge materials and techniques,...etc.