![]() ![]() |
![]() |
UNDERSTANDING AUDIO SONOGRAMSPosted on Wednesday, October 28, 2009 at 10:54 AMUNDERSTANDING SONOGRAMS Recent technology has given us the ability to "see" into many heretofore invisible parts of the world. From X-Rays of broken bones, to CAT scans of the brain, to ultrasound images of yet-to-be born infants; modern technology reveals important and useful information about our world. One such tool that can be very helpful for birders is the sonogram, or more accurately, the audio spectrogram. Audio spectrograms (AS) allow birders to "see" inside a bird vocalization and can provide important clues on how to differentiate one call or song from another. Sometimes they help by showing subtle variations in short calls; other times by helping the birder recognize differences in the larger patterns of complex songs. Once these differences are discerned in spectrograms, they often become much easier to hear and differentiate in the field. In this article I'll provides some guidance on how to read audio spectrograms, and then I'll use them to demonstrate how they can be helpful in differentiating the easy-to-confuse songs of the thrashers found in SE Arizona. WHAT ARE AUDIO SPECTROGRAMS An audio spectrogram is a two dimensional graphical representation of an audio source. Spectrograms are created using a principal, called Fourier Analysis, that states that complex phenomenon like sounds, other physical phenomena, or even equations, can be more easily understood when they are broken down into smaller pieces. To make it possible to "see" a sound, a Fourier analysis is made of the audio and the resulting information is converted into graphical form. Here's how the process works. First of all, the target sound is broken down into very short sections or samples, usually a millisecond or less in length. So a one second sound would be broken down into 1,000 or more short samples. This analysis is repeated for every short time "sample" for as long as the sound lasts. The resulting graphic is a collection of all of these instantaneous representations of frequency content placed on a time line. The horizontal axis is time, showing the length of the audio. The vertical axis is frequency, with dots or lines showing what, if any, content there was at each frequency. SOME SIMPLE AUDIO EXAMPLES Let's look at a very simple example of an audio spectrogram. Suppose we wanted to make an audio spectrogram of a whistle that started at a low pitch and gradually and smoothly rose to a very high pitch over 30 seconds. And let's say the analyzer sampled the sound at each of 30 frequencies, once every second. The resulting graphical representation of the sound would show one dot at each sample for the frequency of the tone at that time. It would look like this:
Here's an Audio Spectrogram of 12 pure tones, each lasting 1 second. The tones are in groups of 3 tones at the same pitch. Each group is lower in pitch than the prior group. The whole selection lasts about 4 seconds, from the 7 second mark to the 11 second mark on the time scale. Notice that the frequency of the first set of tones is about 1kHz or 1,000 cycles per second. (Middle C on a piano is about 260 cycles.)
If you take two sounds of different pitches but equal volume and play them at the same time, you would hear them both at once and it would sound like a chord. The AS would look like this:
Now let's take five sounds and stack them on top of each other. This time we'll make all but the lowest much softer and place each an octave higher than the next. Instead of sounding like a chord, it would sound like just one pitch, the pitch of the lowest note. However it would sound much richer than a simple sine wave. When you are reading a sonogram for a bird song, it's important to remember that the more harmonics visible in the audio spectrogram, the richer the sound.
Here's how a simple upslurred bird call or song would look like.
Now here's part of song that is a fast upward then downward slur and contains two harmonics, making the song sound fairly "rich".
Notice the different notes, and also that each note has very fast changes in pitch caused by the vibrato.
If the pitch of the sound varies quickly and very widely, then it would sound like a trill. Here's the sonogram of a trill, in this case a Cedar Waxwing. Notice the harmonics that show the trill is fairly round or rich and not "dry".
Finally, if you stack a lot of unrelated sounds very close in pitch to each other, instead of hearing a rich tone, you'll hear noise. The "ultimate" noise is actually the simultaneous sounding of equal levels of every possible pitch you can hear as a human. Here's a sonogram of part of a Seaside Sparrow's song that contains almost pure, pitchless noise.
So what does all of this nonsense, that once put you to sleep in physics class and is starting to sound pretty soporific now, have to do with birds? Well, plenty. In order to read an Audio Spectrogram effectively, you need to be able to interpret it in two main ways. First of all, you need to understand the tone of the bird by seeing how many sounds are stacked at any one moment of time and if they look like harmonics or just amount to some kind of noise.
Here is the familiar spring song of a Northern Cardinal (familiar at least for those of us in the Eastern US.) Notice all of the harmonics, denoting a very rich song. The song starts with two very slow upslurs. It then continues with a very steady, fairly rapid sequence of rich tones. If you look closely you can see that each of the rapid notes has a prominent downslur.
The Ovenbird's song starts quietly and increases in volume. As you can see in this sonogram, it also increases in richness of tone. You can also see the two parts of each song element ("tea cher.........tea cher....)
COMPARING CHICKADEE CALLS Now let's examine two different species' vocalizations and see how an AS might help us learn to differentiate them in the field. Here's an AS of a Black-capped Chickadee's "phoebe" call.
There are several useful things to notice about this call.
Now let's take a look at a Carolina Chickadee's call. Although these calls are often confused in the field, they are actually very different. And this difference is quite evident when you look at the audio spectrograms.
The most obvious difference is that the CACH's call has four notes vs the BCCH's two. But let's look a bit deeper to see some more revealing differences, since the BCCH can double its call or the CACH truncate it's call. Another striking difference between the two calls is the pitch difference between the first and second notes in the CACH's call. Whereas the BCCH's two notes were very close to the same pitch, 4.5kHz to 3.75kHz, there is a big pitch jump from the first note to the second note in the CACH: from 6kHz to 3.5kHz. And oddly enough, the second note of the CACH's call is lower than the notes of the BCCH! This can explain some confusion caused by field guides that describe the CACH as being a higher call than the BCCH. Indeed it starts higher, but the second notes are lower. Take a look at the first two notes. Notice the differences in the harmonics between the first note, with only one harmonic, and the second note, with three harmonics. The first note is thin and pure sounding, the second more complex. Certainly the two notes do not sound as similar to each other as the BCCH's notes, which are basically the very same tonal quality. Now look at the graphic area between the first two notes of the CACH's call. You can clearly see a line between the two notes that extends lower than the second note. Since the line indicates many different frequencies in the same very short period of time, this part of the call will not be a clear note, but rather some kind of noise. And it will sound a bit lower than the second note. Since it's short and noisy, then, it will sound a bit like a hiccup or glitch in the song. This glitch is very obvious when you listen to the CACH's song and is very different from the two pure, simple notes of the BCCH. Finally, the length of the CACH's call is about 1.5 seconds, the same length as the two notes of the BCCH, which therefore sounds slower and more relaxed. As you can see, an audio spectrogram can make it easier to "see" inside vocalizations and find the important differences between two species.
Notice that both the Worm-eating and Pine Warblers start their songs softly and then quickly increase to full volume. The Pine Warbler's trill also has a lot of variability in pitch. The Dark-eyed Junco's song shows two very distinct harmonics. This indicates that the song is much richer than the other trills. There also is a rich, very short note between each iteration of the trill, that would help in learning the song. The Chipping Sparrow's song is the simplest, with less internal variation in each individual note and abrupt beginning and ending. Spending more time with these spectrograms will reveal some other differences as well.
Notice the song starts with some relatively clear slurs, and that the buzzes have some remants of harmonics and a distinct change of pitch. Since the Savannah Sparrow's trill shows much denser lines than the trills in the above warbler examples, it will sound much noisier. However it won't sound as noisy as the Seaside or Nelson's Sharptailed Sparrows in the example below. NOISY VOCALIZATIONS I mentioned above that noise is the simultaneous sounding of many unrelated, close pitches. In an audio spectrogram it looks like a big block of dark color. If the graphic of the noise shows black from the very bottom to the top of the AS, then there will be no pitch content at all. Our ear will not hear the sound as having any pitch. However if the "blob" is concentrated in one part of the audio spectrogram, then the noise can sound high or low, especially in relation to other noises in the song. Here are two songs that contain noise.
The noise at the end of the Seaside Sparrow's song is a very dense, fairly even "mass" in the AS. This indicates that the noise has very little pitch, but just sounds like broad spectrum noise. In addition to the noise, it's interesting to notice that before the last large section of noise there are three "notes" that are broad, simultaneous sounding of many pitches near each other. That indicates this section won't sound like a clear tone, or a whistle. But the noise isn't monolithic. Notice that there actually 4 different wide lines on top of each other, with space between them. This indicates the sound will have some of the characteristics of a tone with harmonics. So this section of the Seaside Sparrow's song will have a much more "musical" or rich sound for these three notes. And in fact this change of characteristic within the song is useful in separating the Seaside Sparrow from other sparrows that share their reedy habitat.
The calls of the Hooded Warbler and the Common Yellowthroat are basically pure noise. Not full spectrum noise, but there are no harmonics to add any richness to the call. Looking closer, we can see that the COYE's call is actually made up of several very fast iterations of noise with small intervening spaces. This call, then, will sound more like a very fast rattle than a pure monolithic sound. If you listen for this fast variation, the call becomes much easier to ID. In contrast, the Hooded Warbler's call is very monolithic. Notice also that it is, relatively speaking, a very long call note, and trails off towards the end of the call note. In fact, the note is more than twice as long as the COYE's call note. The Chipping Sparrow's call note is very short and simple. It's basically one very fast event. Notice that, even though it is the highest of the call notes we're discussing, it has one harmonic, showing that it is a fairly rich tone. The Yellow and Magnolia Warblers both have much more pitched call notes. Both show harmonics and considerable variation in pitch within the note. The Yellow Warbler's call note ends lower than it starts and has a fast up and down movement, with most of the energy of the call in the downward movement. The Magnolia Warbler's note is more gentle sounding, with most of the energy of the call at the highest point in the call and then a short falling off of the pitch.
Now let's see how audio spectrograms can help us distinguish the differences between the songs of a difficult group of birds, the thrashers found in Southeast Arizona. We'll examine the songs considering the following criteria: the rhythm of the song, whethere there are stops or spaces in the song, the number of different song elements and how they vary, and the tonal or "pitched" range of the song. These audio spectrograms are of the first 7 or so seconds of the thrasher songs found in the Stoke's Field Guide to Western Birds if you'd like to listen along. The song of the Crissal Thrasher is a good place to start.
Notice first that the song is divided into very obvious sections with very visible short or longer pauses separating the sections. The sections all look fairly different from each other, including a two-part slow slur, a trill, and a three-part faster slur. There is also clear variations in pitch from section to section.
This song also has some long pauses, in fact even longer and more relaxed pauses than the Crissal's song. Now notice that the sections of the song all look much more similar to each other than the Crissal's repertoire, indicating less variation in the song. Also, you can see a number of "spikes" or short long lines indicating chip-like fast notes. These chips seem to puntuate much of the song, unlike the Crissal which has a lot more variety made up mostly of slurs and trills. Finally, the pitch of this song stays in the same general range. Although individual sections have slurs, the basic pitch from section to section is very similar. Again this is in contrast with the more variable Crissal's song.
The Bendire's song shows elements that are often repeated three or four times, so the song will have some feel of repetition, however the sections are not set off from each other by pauses as they are in the Crissal. Notice that the pitch of the elements fall in the same basic range, but that in many song elements thare is a very fairly wide range of harmonics. That indicates the song will be fairly "rich" sounding: not thin and not deep. However there won't be much of a sense of change in pitch. So the song will sound like a very rich, run on collection of repeated sections.
Unlike the Bendire's and Sage Thrasher's songs, the song contains distinct pauses and much more variability in pitch. This suggests the Crissal's song. But notice that the pauses are very brief and infrequent, unlike the longer and more variable and relaxed pauses in the Crissal's song.
For the group with pauses, the Crissal is fairly relaxed, and contains a lot of variety and pitch change. The LeConte's has even longer pauses, but the elements are much more similar,are repeated more times, and are often punctuated by chips as part of the repeated song elements. The Curve-billed song is more "nervous" with fewer and shorter pauses. It contains chips that are their own elements, not part of a larger pattern. And there is much less repetition of song elements than than the more relaxed LeConte's and Crissal songs.
CONCLUSION
Copyright 2009 Tom Stephenson
|
![]() |