Floyd Toole's _Sound Reproduction: The Acoustics and Psychoacoustics of Loudspeakers and Room_ is the best synopsis of 50+ years research into how human hearing interacts with loud speakers and rooms.
It'll help you understand what good speaker design goals should be.
Vance Dickason's _The Loud Speaker Design Cookbook_ is the canonical text on speaker design. If you want a copy you'll do better getting it from a speaker building supply place like partsexpress.com or madisound.com.
On-line, Siegfried Linkwitz (as in Linkwitz-Riley filter, Linkwitz Orion speakers, etc.) has some decent coverage of speakers + rooms
http://www.linkwitzlab.com/frontiers.htm
as well as John Krevosky, especially the polar/power response illustrations
http://www.musicanddesign.com/Old_Home.html
As far as first order cross-overs the phase distortion of even fourth order Linkwitz-Riley cross-overs is inaudible on blind tests with musical signals. First order designs with non-coincident drivers have a broader shallower power response notch than higher-order cross-overs. Excursion continuing to double with each octave below the cross-over point forces higher cross-over points which moves where that notch is on first order designs. The higher cross-over point in turn precludes using light stiff cone materials due to resonances and therefore makes stored energy less likely. You end up with big audible differences from power response approximating what you hear (to be more pedantic, your brain adds up the spectra from everything it identifies as a reflection) and a lower chance for stored energy. That's all quite audible.
The ideal speaker is flat on-axis with a monotonically increasing directivity index and off-axis curves that look a lot like on-axis. People prefer this regardless of listening background, preferred musical genre, and country of origin (Harman has a computer controlled speaker mover allowing blind comparisons which Toole and Olive have used to reach interesting results). Preference becomes significant when you depart from that, with differences in what distortions people object to less.