I agree that a reference / standard needs to be established. Former dictionary such as Oxford English Dictionary (OED) defines a stereo(phonic) system as a system of sound recording or reproduction using two or more separate channels to produce a more realistic effect by capturing the spatial dimensions of a performance. So, imho, two keys elements defines a reference stereo system:
1. Realistic effect; and
2. Spatial dimensions.
In my pursuit of a reference audio system, I aim to achieve a setup that faithfully reproduces the tonal character (timbre) of instruments and voices with a high degree of accuracy. This includes precise imaging—where each sound source is rendered with a clear, stable, and locational presence within the soundstage. So, I were to lay out a specific metric for the reference system including:
1. Timbre accuracy;
2. Imaging; and
3. SS width, depth and height.
This metric may not be measurable instrumentally, but it can certainly be perceived in your listening space with a good pair of ears and a discerning mind. I believe you could always expand this metric to include many more elements you consider paramount. But bear with me for being simple-minded—and tell me, are you there yet?