Transient response. One of the most misunderstood terms I've seen used in the world of audio. Some people will tell you it has to do with moving mass, motor strength, inductance, and an entire host of things. Forget what you've heard for a moment. Let's define "transient response" as how fast a speaker starts, and how fast it stops in reaction to a given input signal. You'll also hear this term referred to as "stored energy".
I can tell you right now that no single manufacturer spec or formula is going to give you an all encompassing, meaningful measure of the transient response of a speaker. Transient response is something that must be measured firsthand.
Let's first investigate the transient response of several tweeters at 2khz, using a shaped tone burst. Simply put, the speaker is excited with a special test signal (much like a quick ping) that is centered directly at 2khz, and the following graph is generated showing the initial rise time response and the decay times. A perfect transient response would show no delays during the rise of the signal, and no delay or bumps in the decay.
Looking at the response of all three tweeters, notice the initial rise time between 0 and 1 millisecond. They all rise rather quickly, and there's no issues here.
However, we can see that after 1ms there is a drastic difference in the time it takes for the signal to decay between the three tweeters. That is what is commonly referred to as "overhang" or "stored energy". The signal has stopped playing, but the speaker has not.
As we can see, the black line Seas metal dome tweeter has the fastest decay time. That is not surprising, given that metal cones are often stiff and fast to respond.
The red line Max-fidelity tweeter lives up to its name with only slightly worse decay time than the Seas.
Lastly, we can see the poor Morel dome in blue. It is clearly the worst performer of the three, taking quite a bit of time to come to a stop.
Let's examine another graph, this time a waterfall plot. Waterfall plots are a bit different way of looking at transient response.
Looking at the above graph, we can see it looks much like a frequency response graph, except that there's a third dimension/axis that represents time. A waterfall plot shows you the frequency response over time.
Now we should expect that for higher frequencies, the decay times would be very quick. For lower frequencies, the decay times should be progressively longer. For example, a 10khz sine wave is 10,000 cycles per second. That means it only takes 0.1 millisecond second to play 1 cycle of a 10khz sine wave. For a 1khz sine wave, it takes 1 millisecond, and for a 100hz sine wave it takes 10 milliseconds.
What's critical to look for are frequencies that decay very slowly along the time axis. We know that higher frequencies should decay quickly, and if they don't we have a problem.
If you look at the graph above, you can see at 10khz there is a clear problem. There's a very obvious ridge that doesn't disappear for nearly 3ms. The longer decay times below 2khz are fine however, because at lower frequencies we would it expect it to take longer for a signal to "finish playing".
In this case, we wouldn't want to use this speaker anywhere near 10khz.
So we can see here, that shape tone burst tests are great for examining the start and stop times of a speaker at a single frequency in great detail. The waterfall plots however, are great for examining a speaker's entire frequency response over time and finding "trouble spots"... frequencies that continue to play well after the original signal has stopped.
Now, probably the most important thing is what does poor transient response sound like? A speaker with poor transient response generally sounds dull, and veiled. "Cloudy" would be the appropriate word. In most cases the differences are somewhat less dramatic, especially as you go below 200hz I would say they are almost inaudible.
For example, at 20hz it takes almost 50 milliseconds to play 1 full cycle. So a 1 or 2 millisecond "overhang" is practically inaudible. At 1khz however, it only takes 1ms for a full cycle so that extra 1-2ms it takes to decay is actually quite significant.
One last important note... don't assume that sloppy, boomy bass, muddy midrange, or dull lifeless tweeters are the result of poor transient response. The effect of poor transient response is more like a haze, or dulling of the sound. The most extreme example I could think of is playing music inside a large gym with hard floors and walls... think of all the echoes and reflections that kind of wash out the sound. The music is actually continuing to play long after the original signal has stopped, due to all the echoes.
Sloppy bass on the otherhand, is often due to excessive distortion and improper tuning. The speaker is playing signals that weren't originally present, making it sound harsh, muddy, and well... distorted. Also, an imbalance in the frequency response can lead to any speaker sounding muddy, peaky, or slow. The sound is quite different than having poor transient response, if you know what to listen for.
I can tell you right now that no single manufacturer spec or formula is going to give you an all encompassing, meaningful measure of the transient response of a speaker. Transient response is something that must be measured firsthand.
Let's first investigate the transient response of several tweeters at 2khz, using a shaped tone burst. Simply put, the speaker is excited with a special test signal (much like a quick ping) that is centered directly at 2khz, and the following graph is generated showing the initial rise time response and the decay times. A perfect transient response would show no delays during the rise of the signal, and no delay or bumps in the decay.

Looking at the response of all three tweeters, notice the initial rise time between 0 and 1 millisecond. They all rise rather quickly, and there's no issues here.
However, we can see that after 1ms there is a drastic difference in the time it takes for the signal to decay between the three tweeters. That is what is commonly referred to as "overhang" or "stored energy". The signal has stopped playing, but the speaker has not.
As we can see, the black line Seas metal dome tweeter has the fastest decay time. That is not surprising, given that metal cones are often stiff and fast to respond.
The red line Max-fidelity tweeter lives up to its name with only slightly worse decay time than the Seas.
Lastly, we can see the poor Morel dome in blue. It is clearly the worst performer of the three, taking quite a bit of time to come to a stop.
Let's examine another graph, this time a waterfall plot. Waterfall plots are a bit different way of looking at transient response.

Looking at the above graph, we can see it looks much like a frequency response graph, except that there's a third dimension/axis that represents time. A waterfall plot shows you the frequency response over time.
Now we should expect that for higher frequencies, the decay times would be very quick. For lower frequencies, the decay times should be progressively longer. For example, a 10khz sine wave is 10,000 cycles per second. That means it only takes 0.1 millisecond second to play 1 cycle of a 10khz sine wave. For a 1khz sine wave, it takes 1 millisecond, and for a 100hz sine wave it takes 10 milliseconds.
What's critical to look for are frequencies that decay very slowly along the time axis. We know that higher frequencies should decay quickly, and if they don't we have a problem.
If you look at the graph above, you can see at 10khz there is a clear problem. There's a very obvious ridge that doesn't disappear for nearly 3ms. The longer decay times below 2khz are fine however, because at lower frequencies we would it expect it to take longer for a signal to "finish playing".
In this case, we wouldn't want to use this speaker anywhere near 10khz.
So we can see here, that shape tone burst tests are great for examining the start and stop times of a speaker at a single frequency in great detail. The waterfall plots however, are great for examining a speaker's entire frequency response over time and finding "trouble spots"... frequencies that continue to play well after the original signal has stopped.
Now, probably the most important thing is what does poor transient response sound like? A speaker with poor transient response generally sounds dull, and veiled. "Cloudy" would be the appropriate word. In most cases the differences are somewhat less dramatic, especially as you go below 200hz I would say they are almost inaudible.
For example, at 20hz it takes almost 50 milliseconds to play 1 full cycle. So a 1 or 2 millisecond "overhang" is practically inaudible. At 1khz however, it only takes 1ms for a full cycle so that extra 1-2ms it takes to decay is actually quite significant.
One last important note... don't assume that sloppy, boomy bass, muddy midrange, or dull lifeless tweeters are the result of poor transient response. The effect of poor transient response is more like a haze, or dulling of the sound. The most extreme example I could think of is playing music inside a large gym with hard floors and walls... think of all the echoes and reflections that kind of wash out the sound. The music is actually continuing to play long after the original signal has stopped, due to all the echoes.
Sloppy bass on the otherhand, is often due to excessive distortion and improper tuning. The speaker is playing signals that weren't originally present, making it sound harsh, muddy, and well... distorted. Also, an imbalance in the frequency response can lead to any speaker sounding muddy, peaky, or slow. The sound is quite different than having poor transient response, if you know what to listen for.