They have been here. ATC specifically has designed for wide dispersion as this is critical in larger studio control rooms where mutiple people are working on the recording (scoring recording) or you have a large console that has a lot of channels (for orchestra or a large production). In these cases you cannot have only one person hear the image, you’ll not build the right mix as it's a group effort.
Nearfield evolved in studios to enable an engineer to sit close to the speakers and reduce the ratio between reflections and direct sound. Basic near field is two speaker set up in a very small triangle with the listener very close- maybe even a few feet apart. This is done to enable a mixer to work in different rooms and get a similar result as you reduce the rooms influence [note room influence cannot be completely eliminated only reduced or exaggerated]. This is what made studio near field speakers like Auratones or NS10 popular as they are so small. The near field technique works only when it's one person and he or she can sit close and reduce the triangle between left/right and listener.
So to build a larger sweet spot you need very wide dispersion speakers with excellent off axis response and enough SPL capability to fill the larger space (since you still need to be far away from walls to reduce reflections, reducing reflected to direct sound ratio). So larger sweet spots also require special speakers, special set ups and larger spaces. Toe in reduces image size every time but some speakers need this as they are narrow dispersion (like horns) and you never get them to "meet" at your location unless you do significant toe in (depending on how far you sit from them). It’s extremely difficult in small spaces to find the right compromise especially since manufacturers really don’t share their off axis response with you so you may own narrow dispersion speaker and not know it.
Narrow dispersion can sometimes be appealing in practice as with narrow HF, it reduces the first reflections (from side walls) in the room and can improve the image. The owners can think their speakers image better when they actually image worse BUT the tiny sweet spot works better in that specific space.