First Order Crossovers: Pros and Cons


I wonder if some folks might share their expertise on the question of crossover design. I'm coming around to the view that this is perhaps the most significant element of speaker design yet I really know very little about it and don't really understand the basic principles. Several of the speakers I have heard in my quest for full range floorstanders are "first order" designs. I have really enjoyed their sound but do not know if this is attributable primarily to the crossover design or to a combination of other factors as well. In addition, I have heard that, for example, because of the use of this crossover configuration on the Vandersteen 5 one has to sit at least 10 feet away from the speakers in order for the drivers to properly mesh. Is this really true and if so why? Another brand also in contention is the Fried Studio 7 which also uses a first order design. Same issue? Could someone share in laymans terms the basic principles of crossover design and indicate the advantages and disadvantages of each. Also, what designers are making intelligent choices in trying to work around the problems associated with crossover design? Thanks for your input.
128x128dodgealum
The exchange between Roy and Golix above leads me to bring up something I have been thinking about for a long time – the phasing of the recorded music before it even reaches your stereo system.

I have read enough about 1st order crossovers and phasing and to convince myself that this design philosophy is valid and this approach is the only one that can achieved a time coherent design. To me the $20,000 question is – does it matter enough to outweigh all the other compromises that speaker designers must make?

Others have asked if time coherence, or lack there of, audible. Let’s assume the answer is yes. The next question, is it preferable? Obviously not to everyone. Many have auditioned Vandersteens, Theils, and Meadowlarks and still chosen other brands.

Time coherence designs with first order crossovers are the only design that can come close to preserving the original waveform, most easily seen by measuring its step response. I totally agree with this. But music is not a square wave, so tracking something that doesn’t exist in natural music doesn’t prove you can make better music. Transients can be sharp, but they are not infinite slop step functions.

Going back to my first paragraph, what about the phasing of the music recorded on that CD or LP? Even if a stereo system from source to speaker was perfectly able to preserve the original waveform, what are we trying to preserve? I know very little about the recording process, but I can bet many an album has been processed by recording engineers in ways that destroys the phasing of the instruments used to make the music. Some genre’s may be better than others. Some studios may be better than others. But how are we, the music buyer, supposed to know if the waveform we are buying is worth preserving? I would love to see some discussion on this HUGE factor. I have read Richard Vandersteen tests his speakers with his own recordings. I am sure those recording have their phasing preserved. But doesn’t this say something about the phasing of recorded music in general?

Finally I would also like to see discussions on the real world trade-offs of using 1st order crossovers. Like small sweet spots. Is that inherent to all 1st order designs or do all good imaging direct radiating speakers have that? And is the sound outside the sweet spot worse for time coherent designs? If so why? I have only listened to Vandersteens (of the 1st order designs) and the difference in sound from sitting in the time coherence zone to standing up is quite alarming. There is something special going on in the sweet spot, but the Vandersteens sound flat when I stand up – treble drops right off. I know they just lost their time coherence, but when I perform the stand/sit test with my home speakers, the difference is far less dramatic. Do 1st order designs sound extra good in the sweet spot and extra bad everywhere else? Comments please.

Other inherent trade-offs of 1st order designs? Thanks in advance.
It seems Roy is correct, as usual.

But I do want to chime in on one point: There are companies which design and advertise phase coherent speakers without making claims of time alignment, such as VMPS or Fried. There is nothing wrong with doing this, although it's understood that Roy Johnson would not pursue or endorse this design decision.

Then there are companies whose ads specifically claim time coherence, when their multi driver speakers do not have first order crossovers and the baffles are not stepped back in any way. These ads are lying consciously, or someone in the engineering or marketing department is confused. Dali is the most recent and eggregious example of this phenomenon, which unlike VMPS or Fried, amounts to snake oil.
Applejelly:

You are indeed correct that recorded phase is a serious problem, and one which is utterly ignored by many recording engineers. However, some of them do take it seriously, along with many other aspects of their craft. So, in my opinion, it is well worth having a system that preserves time and phase. It can't fix the bad recordings, but at least it doesn't screw up the good ones.

About the step response: You are correct that music is not a square wave, but I will make the argument that the square wave (or step function) is the best test signal yet devised for predicting musical fidelity in a loudspeaker. The square wave, after all, is simply the mathematical summation of an infinite number of phase-matched sine waves in a specific frequency progression. As such, any transducer that passes a clean square wave is eminently qualified to perform well on any musical signal it will ever encounter.

The reason that first-order designs change sound when going from sitting to standing has to do with lobing patterns, as discussed in previous posts. First-order designs have more overlap in output between the two drivers, so the lobing effects are more noticeable than with higher-order crossovers. However, there is an important point to keep in mind: The higher-order crossovers have non-uniform phase characteristics by definition, resulting in audible distortions of a different kind, even when listening on-axis. In other words, first-order is bad if you want to listen critically while standing up; higher-order is bad no matter where you are.

I think that many people who listen to Vandys (etc.) and end up buying something else are doing so because of issues other than time-and-phase coherence. This is not the only thing that matters, not by a long shot. The Vandys are built to a specific budget point, and in their price range, there are other speakers that give higher resolution, tighter bass, more treble extension, a more neutral tonal balance, etc. Anyone who values one of these specific things highly could easily decide to buy something else.

However, having said that, it is hard to imagine a better "real-world" compromise at a given price point than what Vandersteen has achieved. As I am fond of saying, the 2C may not be "the best" in any one area, but it is the cheapest speaker ever made that can truly lay claim to having addressed all of the important fundamentals in loudspeaker design. Its status as the best-selling high-end speaker of all time supports this conclusion.
Karls, you said it well. Thank you. I should have included that for the benefit of those familiar with vector addition.

Sean, the answers to what you ask in your first two paragraphs lie in Karls' contribution, and the information you present in your last three paragraphs is correct. Yes, we are trying for a linear phase response across the board, which is not possible as you approach the frequency extremes, without digital correction (which has not, in my experience, been applied correctly to any speaker). There will be some thoughts on that on our new website by the time it is published, and if not, in the weeks following.

Skrivis, I think some of the reason Stereophile does not draw any conclusions is related to what Paul Candy says about that in his sixmoons.com review of our Callisto speakers, about alienating most of their speaker manufacturers. Part of it (again, in my opinion) is that their tests only extend down through the mid and tweeter crossover range, never down into the woofer/mid crossover range, which is where more of the music lies.

Finally, one must know what time-coherence really sounds like, perhaps by going to hear live music up close and personal, for hundreds of hours. Ever hear a string quartet practice for weeks in one's living room, and then get up and walk around them? Or a bluegrass band, or a wind band, or a Fender Twin Reverb, or a soprano, or a Steinway played with expert hands? From four feet away? How about experiencing a 100-voice chorale, on stage from 20 feet? Across weeks of rehearsals? I know this is what a speaker designer needs to do.

Those are not common experiences for anyone, let alone to have spent hundreds of hours in studios learning what the mic hears and how the studio must alter its sound, so that we think it is a) pleasing, and b) realistic.

Golix,
You say "Also a speaker which has the drivers on a vertical axis can only be phase coherent at one point in space. This of course is a purely geometrical problem independent of any electrical feature."
True. And in a two-way speaker, that point can be aligned to anywhere via tilting the speaker and with a small change in your ear height for the final touch.

For a three way, one has to move the top two drivers relative to the woofer. Our Continuum 1 three-way offered adjustable driver positions in 1995, and was reviewed in Audio Ideas Guide in `97. That adjustability has become a standard feature in all our three-way designs, and for several new models coming into production over the next many weeks. We call this "adjusting the Soundfield Convergence(tm)".

Golix, please do reconsider your notion that
"Basically a phase coherent speaker is one that is not only [coherent] in time but also in phase; a time coherent speaker is one that's [coherent] in time but not in phase."

To be correct, that should read nearly the opposite:
"Basically a phase coherent speaker is one that is ONLY coherent in phase; a time coherent speaker is one that's coherent in time AND in phase. All the math, all the physics supports that.

And we speaker designers get to screw that up! Nowhere in the recording chain, nor in the playback chain is the timing split between highs and lows, or the polarity, or both. This is purely a speaker phenomenon/distortion. One can learn to recognize that, the way an amplifier designer can hear if someone's amplifier needs a bigger transformer- it's a unique sound distortion.

Thus, Golix, please understand that while those Tannoys are indeed smoothly phase coherent, they are not time coherent. A step function would show this: Perfection in a step function looks like the plus-half of a single square wave, rising up quickly, leveling off and then going on forever- like a single stair step. There would be no ringing or rounding over at the initial corner, and the top would stay level forever, never returning to zero.

In that Tannoy, the first energy to arrive from that step-input is upwards-going, as it should be. A moment later, the late-arriving, inverted-polarity tweeter shoves (sucks) that initial positive air-pressure-increase down into the negative-air-pressure portion of the graph.

The tweeter's dome then returns to rest from its full "-" excursion, because the crossover cannot pass the "DC" to tell it to "hold your position, albeit sucked in". The air pressure then returns to the positive from the midrange tones' positive-pressure continuing to arrive.

Finally, no speaker ever then "levels off" and holds that air-pressure "positive" interminably, because the room leaks that pressure away. So the step droops back to zero, even though you see the woofer still shoved "out".

With regard to our measurements- we have those measurements supplied by the driver manufacturers, such as MorelUSA, taken in their chambers. But those measurements are usually taken in a half-anechoic chamber. Go to the Scanspeak website here:

http://www.d-s-t.com/link/scs/data/d2008_851200c.htm
and click the frequency response graph to see their measurement setup.
Can anyone say that is a realistic test of the direct sound from a tweeter? You will see a picture come up of a woofer mounted in this fully-reflective wall of that test chamber- their tweeters are tested in that same position.

For our own testing, we are still in the analog days here, not for want of trying to go digital, so I cannot show you hard copy. I will be working to present this information on our website as we continue to grow.

I can tell you that our anechoic chamber is outdoors when required from 200Hz on up, which covers the woofer/mid crossover region. Testing indoors, in an average room, is fine for looking what happens from 800Hz on up. There are also certain ways to combine very nearfield measurements, that I must decline to describe, which obviate the need for a chamber.

Digital test-gear has not shown us what we need to know any more accurately, and I have extensively used/leased all of the well-known systems available. I do know that it is easier to perform many more misleading tests in the digital domain. One has to scrutinize for many problems, with very specific measurements, either by analog or digital means. There is no one, or two, or three measurements, or even "dozens", that tell anyone how a speaker will actually sound. It takes many more than that, with an experienced ear listening for suspected deviations that physics is pointing out, and a working knowledge of what "a suspected (and/or measured) deviation or problem" should sound like.

In analog measurements, we look directly at the `scope. In particular, we look at the moment of first arrival of a burst of 4-8 cycles of a single sine-wave tone, taken all the way up the frequency scale. Perfection in time-arrival means that each of those tone bursts, at each frequency tested, starts upwards from the zero-axis at the very same L-R position on the `scope face.

The Tannoys would show a left-to-right motion of that starting point (which is the time delay creeping in) in the crossover range, and then the tone burst smoothly flips upside down in the tweeter range. Ours stand still from 200Hz to 8kHz, and always have the same polarity.

What does +/-2 degrees mean at 200Hz? It is +/- 1/180th of a 200Hz wave's period, or +/- 1/180th of 1/200th of a second, or +/- 0.3millseconds, which is readable on a `scope face. This, for the lower midrange, amounts to a front-to-rear shift of the mid-driver's location by +/- 0.4 inches, relative to the woofer. I can hear when the focus becomes as sharp in that crossover range as it is away from that range. So has every person for whom I have demo'd this.

The audible change from moving that mid back and forth, even an eighth of an inch, cannot be explained by wave-cancellation math, nor can it be explained by any change in the cabinet-face or wall-surface reflections in my designs.

We hear the difference as a loss of sharpness, or definition, of a sound's location from front-to-rear. Depth is time delay, and the sharpness of the image begins at its front-most element (the singer's mouth). If that initial location is smeared from front-to-rear, then the depth "behind" that voice is also smeared over by that initial information, and the depth itself is also smeared in time.

This is all information audible by WHEN it arrives. If that initial location is smeared, we also hear a loss of attack, which is a leading-edge phenomenon- another time-domain aberration. There exist many more ways the ears can guide time-domain measurements, and vice versa.

For the 8kHz point, that +/- 2 degrees amounts to a spatial shift of +/- one one-hundredth of an inch- tough to measure: One can easily have the microphone inadvertently jiggling from floor vibrations, by that small amount. It can be heard however, as an overall clarity of the top end, because there are a lot of frequencies nearby that 8kHz- notably the ones all the way down to 4kHz- only one octave, one "undertone" away.

I can hear when the ribbon supertweeter in our previous Imago flagship-design is a 1/32nd of an inch too close or too far away- it was crossed over from the Dynaudio dome tweeter at 8500Hz:
What moment did the stick strike a small bell or a triangle (which creates a very sharp and brief transient) relative to when did that instrument's actual tones emerge? They should have begun after the stick hit and then was removed from the metal body, right? Yet, the timing can be warped just enough so that one hears the stick-hit occurring AFTER the tones start. Now that is an un-natural sound anyone can identify! And the time-delay from this small offset of that supertweeter? Millionths of a second.

The same thing happens when judging the firmness of the felt on a mallet on a tympani or vibraphone. Or no felt at all- just the sound of hard wood, or a large-diameter mallet head or a small one. They each make their own sound, which a time-coherent speaker easily reveals, even in the midst of an entire orchestra reaching its crescendo around them.

In the usual mid-to-tweeter crossover range, achieving precise focus lets us hear exactly when the singer's tongue leaves the roof of her mouth- important to her shaping that note. Or to the definition of any other instrument that requires half-mid and half-tweeter, such as tambourine, trumpet, guitar, piano...it's a long list that includes non-instrument wideband-sounds, such as applause and film "noises". Then include the distinctive sound of each one's ambience directly behind those events- there is much to listen for, that leads to more musicianship being heard.

Also, it is possible for nearfield, tweeter-only measurements to have a standing wave build up between the microphone and the tweeter's dome, on sinewave tones, which totally fouls up anything we are trying to measure. Changing that test-tone's frequency by just a few percent, or moving the microphone back just a 1/4 of an inch, makes a huge change in the sound pressure level at the microphone (again, on a sine wave).

This is somewhat related to the how the notion of first-order speakers having comb-filtering effects comes about- from applying the math, and measuring, with specific single tones. Which do not occur in music, especially if that particular frequency lies between the tones of the musical scale. I do agree with all of what Karls goes on to say in his post right above, including his analysis of lobing. However, I find lobing is exaggerated when the cabinet-face, or even the area right around the tweeter, is contributing many reflections.

Comb filtering, from simple, "fewest possible drivers", first-order speakers, is not apparent to me, or at least objectionable on music. The ultimate audible test was comparing what is heard out of a speaker's mid/tweeter crossover range, with what is heard in that tone range from a small, say 6" square, electrostatic panel, or a plasma tweeter. We have done that, and found no significant differences that we can say were from the comb-filtering effects that must indeed arise from having two drivers producing the same range.

Multiple drivers in the same tone range present a lot of different frequencies to cancel out, because those six tweeters, for example, each arrive later than the one nearest your ear. That leads directly to lobing, which is a frequency-dense form of comb filtering. What you want to call it depends on how you measure it.

Stereophile thoroughly tested our original Diamante model in April 1994, and showed how its step response aligned quite well between mid and tweeter as the microphone was moved down to their time-coherent axis. JA was really nice to us by also showing how the corresponding step response also changed (for the better) as that time-coherence was achieved. He then showed how the overall phase response measured, which was pretty good. Ten years later, our deviation from zero is far less. Also, the Diamante tweeter's tone balance on that "best" axis was not flat for him, because at 50" away, the tweeter was well above the mic (the mic was far off the tweeter's axis). In the same issue, examine the B&W Silver Signature two-way monitor's step response. Not even close to being a step at all...The Diamante review is not archived on-line, unfortunately. Maybe those measurements are- I have not searched for those in JA's database he graciously offers.

Returning for a moment to the use of tone-bursts: One can also look at the envelope shape of each burst, from which many things can be seen, such as cones and cabinets flexing. If there is cone-breakup/ringing, then that energy was stolen from the initial input of energy into the cone. That is something that can be seen at the beginning of the envelope, as the output failed to reach full height on the very first cycles. The cone flexing absorbed that energy, only to give it back later. Of course the cone could be highly damped, then it never gives it back as audible sound, but just leaves the initial dynamic-rise blunted. Too "laid back" you would hear. Think about the dynamic response heard from soft plastic cones...

One can see a returning echo from inside the cabinet, after the end of the pulse, which can be fixed. A flex in a cabinet wall can be detected, and that can be stiffened. A reflection off the cabinet-face can be seen, and that can be absorbed or avoided. One does not need an anechoic chamber to perform those tests.

There is digital hope for us: This Summer, I look forward to working with Agilent Technologies in developing a system that will do what we need. A few years ago, the computing power was also not available for certain tests I have always wanted to make in the digital domain. Now it looks like it is.

My apology that this is so long, but I do not see this basic information published elsewhere. When (and if) you re-read it, it does seem to fit together. There were also a lot of good questions posed one after the other. Arnie of Audiogon, thank you so much for publishing this.

Applejelly:
You ask, "Even if a stereo system from source to speaker was perfectly able to preserve the original waveform, what are we trying to preserve? I know very little about the recording process, but I can bet many an album has been processed by recording engineers in ways that destroys the phasing of the instruments used to make the music."

The answer, yes, they destroy the phasing, just as you say. But please note that no studio effect ever splits the time-coherence of the signal anything like a speaker can. You are indeed trying to preserve the "original waveform", for nothing more than to reduce another audible distortion we don't need to hear. I hope that helps, because yours is a valid question. Karls, I would say that time coherence not only helps great recordings, but is very necessary to avoid "chewing up" distorted recordings. Think about how "distorted distortion" would sound.

Applejelly, you also ask, "And is the sound outside the sweet spot worse for time coherent designs?". Yes and no. It is better than severely phase-shifted speakers, because when you stand, you are not moving spatially as far off alignment as the other crossovers delayed the signals. And by direct comparison, the other speakers are scrambled even sitting down, and so your added "positional" phase shift does not add that much more. Karls says something on this, above.

You hear the difference on first-order speakers precisely because you have at least a focal point to compare, as you physically move away from it. I know that with proper attention to the speaker's design, at ten feet or more away, it does not feel like your head must be in a vise- I know now that is one artifact of the drivers "not quite being in full alignment- just very close". As that broadband alignment is widened and sharpened, the sweet spot relaxes.

About the highs going away when you stand? That is what happens with a particular design you heard. This is not indigenous to "being a first-order speaker", but only "that particular first-order speaker" you auditioned. What one can say with certainty is that when you stand, you always hear less depth to the image.

Also, you did hear the tweeter's sound emerge first, which is not natural, but it is emerging first by far less of a time "advance" than what higher-order speakers do. And if it emerges even more "too soon" when you stand up (still less "too soon" than with high-order speakers), then it can reach a relative location that lets it cancel the mid's output in the range above the mid's crossover point, and that means "less highs."

Thanks for the compliment, Suits_me. I don't think I have made any mis-statement, but please let me know if I foul up. Since this is my profession, I deeply feel I owe every bit of science, and knowledge of the sound of real music, and of how studios work, and how we hear, to my designs and our customers.

Thanks to all for reading through this. I hope you found it worth your time. I wish that I had someone tell me all of this when I started designing in 1973! My hope is that someone young picks up the ball and runs with it, to see what we have from them in thirty years, `cause it probably won't be from me! It is part of what is behind my mention of a "Foundation" in sixmoons' Callisto review's Q&A at its end. Also there are all the topics I consider important to a speaker's design, before we even strap on a crossover.

Best regards,
Roy Johnson
Founder and Designer
Green Mountain Audio