As to why, I never questioned it but assumed each autoformer has an input and an associated output tap depending on the voltage reduction (i.e., volume setting), and that one AVC was required for each audio signal line so, one per channel single-ended and two per channel for balanced. Whether it could be done differently I don't know, but they all seem to use two for single ended and four for balanced. Maybe somebody who has built one can answer your question.
@mitch2 FWIW Dept.:
I've been designing balanced audio equipment for the high end market for 40 years; we pretty much introduced balanced line operation to high end audio. All you need to do to make a transformer operate balanced is not connect either side to ground (the connections going to pins 2 and 3 of the XLR connector) and only ground the case of the transformer (which ties to pin 1 of the XLR). That's how its done in the recording studio and broadcast. If what you're saying is correct, my surmise is they don't understand this aspect of the balanced line system (codified by AES48, one of the balanced line standards). But I've run into that in high end audio a lot.
With two transformers per channel (one for each phase) its pretty obvious that one side of each transformer is grounded. When you do that you lose one of the primary aspects of balanced line operation, which is interconnect cable immunity; to prevent the cable having a 'sound'. Put another way if you hear differences between cables you've got a problem. Another reason you run a single transformer you maximize Common Mode Rejection Ratio, which is to say you improve the system's ability to reject that which is not the signal, such as hum and noise. Two transformers for one channel would require the transformers and all the components associated be matched; even if this were done to tolerances less than 1% you'd still have a dramatic loss of CMRR by a good 20-40dB.
A single transformer driving a balanced line or receiving a balanced line has a nearly ideal CMRR (which is to say as high as it can get, as much as 120dB) by comparison.