Consider a kick drum on an open field where there are no reflections. The higher frequencies of the beater hitting the head arrive before the full amplitude of the head deflection, yet you still identify it as a kick.
You can see a kick waveform here
http://192.168.1.160/assets/CblSnkOil/Signal_vs_ACLine.png
Putting the same kick drum in a room does not change your identification as a kick, but you are fully aware it is now a kick in a room
In a multi driver speaker the frequencies which make up a kick are reproduced by different drivers. The sound begins when the electrical signal starts
and in the voice coil plane. The ≈50Hz fundamental starts propagating from the woofer at the same time as the [say] ≈2250Hz beater whack does from the mid. If the voice coil of the woofer is 3 inches back from the mid, the whack is ≈180° out of phase by the time the fundamental starts to reach your ear.
Time aligning attempts to minimize these displacement artifacts. Nothing is ever perfect and there are crossover affects to consider as well, but properly aligned systems have a coherence missing in others. Our brain can correct for frequency variation but cannot shift time.
Compared to live acoustic sound, all playback sucks. But as Nietzsche opined
"Without music, life would be a mistake." So we put up with it.