UNDERSTANDING AUDIO SONOGRAMS

Posted on Wednesday, October 28, 2009 at 10:54 AM

UNDERSTANDING SONOGRAMS

Recent technology has given us the ability to "see" into many heretofore invisible parts of the world. From X-Rays of broken bones, to CAT scans of the brain, to ultrasound images of yet-to-be born infants; modern technology reveals important and useful information about our world.

One such tool that can be very helpful for birders is the sonogram, or more accurately, the audio spectrogram. Audio spectrograms (AS) allow birders to "see" inside a bird vocalization and can provide important clues on how to differentiate one call or song from another. Sometimes they help by showing subtle variations in short calls; other times by helping the birder recognize differences in the larger patterns of complex songs. Once these differences are discerned in spectrograms, they often become much easier to hear and differentiate in the field.

In this article I'll provides some guidance on how to read audio spectrograms, and then I'll use them to demonstrate how they can be helpful in differentiating the easy-to-confuse songs of the thrashers found in SE Arizona.

 WHAT ARE AUDIO SPECTROGRAMS

An audio spectrogram is a two dimensional graphical representation of an audio source. Spectrograms are created using a principal, called Fourier Analysis, that states that complex phenomenon like sounds, other physical phenomena, or even equations, can be more easily understood when they are broken down into smaller pieces.

To make it possible to "see" a sound, a Fourier analysis is made of the audio and the resulting information is converted into graphical form. Here's how the process works.

First of all, the target sound is broken down into very short sections or samples, usually a millisecond or less in length. So a one second sound would be broken down into 1,000 or more short samples.
 
The analyzer then checks each sample to see if there is an sound present at that moment in time. If audio is present, then it checks the sound at each of many different frequencies to determine which frequencies are present at that time. The presence of any audio content in a band is then graphically represented by a short line or dot at each frequency  present at the time of the sample. The intensity or loudness of the audio at each frequency is represented usually as a lighter or darker line on a continuum from very soft (light mark) to very loud (much darker mark).

This analysis is repeated for every short time "sample" for as long as the sound lasts. The resulting graphic is a collection of all of these instantaneous representations of frequency content placed on a time line. The horizontal axis is time, showing the length of the audio. The vertical axis is frequency, with dots or lines showing what, if any, content there was at each frequency.

SOME SIMPLE AUDIO EXAMPLES

Let's look at a very simple example of an audio spectrogram. Suppose we wanted to make an audio spectrogram of a whistle that started at a low pitch and gradually and smoothly rose to a very high pitch over 30 seconds. And let's say the analyzer sampled the sound at each of 30 frequencies, once every second. The resulting graphical representation of the sound would show one dot at each sample for the frequency of the tone at that time. It would look like this:


Now let's look at the audio spectrogram of a simple sine wave. If you remember back to your physics class (you weren't sleeping were you...) a sine wave is the purest of all tones. It consists of only 1 pitch, with no overtones, and is similar to the sound you would hear from a flute or a very pure whistle. A sonogram of a one second sine wave at one pitch would have only one line, representing the pitch of the sound, and the length would be one second's worth of distance on the graphic.


A second pure whistle of the same length, but at a lower pitch, would have a single line also lasting a second, but the line would be lower on the graphic than the first line.

Here's an Audio Spectrogram of 12 pure tones, each lasting 1 second. The tones are in groups of 3 tones at the same pitch. Each group is lower in pitch than the prior group. The whole selection lasts about 4 seconds, from the 7 second mark to the 11 second mark on the time scale.

Notice that the frequency of the first set of tones is about 1kHz or 1,000 cycles per second. (Middle C on a piano is about 260 cycles.)


Here's a sonogram of "row row row your boat" performed with a flute which has no overtones.

 

 

If you take two sounds of different pitches but equal volume and play them at the same time, you would hear them both at once and it would sound like a chord. The AS would look like this:

 

Now let's take five sounds and stack them on top of each other. This time we'll make all but the lowest much softer and place each an octave higher than the next. Instead of sounding like a chord, it would sound like just one pitch, the pitch of the lowest note. However it would sound much richer than a simple sine wave.
If you remember back to your physics class again, this is what happens when a bow excites a string and the resulting sound consists of one or more harmonics. The more harmonics, the richer sounding the sound.

When you are reading a sonogram for a bird song, it's important to remember that the more harmonics visible in the audio spectrogram, the richer the sound.

 

 

Here's how a simple upslurred bird call or song would look like.

 

Now here's part of song that is a fast upward then downward slur and contains two harmonics, making the song sound fairly "rich".

 


If the pitch of the sound varies very quickly during its duration, but only by a small change in pitch, then this would be visible on an AS as a ripple or wave in the graphic. Here's an example of part of "row row row your boat" played with a sine wave that has vibrato, or fast, small variations in its pitch.

Notice the different notes, and also that each note has very fast changes in pitch caused by the vibrato.

 

 

If the pitch of the sound varies quickly and very widely, then it would sound like a trill. Here's the sonogram of a trill, in this case a Cedar Waxwing. Notice the harmonics that show the trill is fairly round or rich and not "dry".

 

 

Finally, if you stack a lot of unrelated sounds very close in pitch to each other, instead of hearing a rich tone, you'll hear noise. The "ultimate" noise is actually the simultaneous sounding of equal levels of every possible pitch you can hear as a human. Here's a sonogram of part of a Seaside Sparrow's song that contains almost pure, pitchless noise.



In summary, the more simultaneous sounds that are harmonics of the lowest tone, the richer the sound. The more simultaneous sounds that are not harmonics, the noisier or raspier the sound. Keep this in mind as we now look at some bird songs.

 
ON TO BIRD SONGS

So what does all of this nonsense, that once put you to sleep in physics class and is starting to sound pretty soporific now, have to do with birds? Well, plenty. In order to read an Audio Spectrogram effectively, you need to be able to interpret it in two main ways.

First of all, you need to understand the tone of the bird by seeing how many sounds are stacked at any one moment of time and if they look like harmonics or just amount to some kind of noise.
And secondly you need to interpret the rhythm and patterns of the song as it unfolds over time.


Let's take a look at a couple of simple bird songs that demonstrate some of the basics we have been discussing above.
Here is an audio spectrogram of the very clear tones of a Lesser Yellowlegs. Notice the harmonics, denoting a rich tone; and the downward slur of each note.

 

 

Here is the familiar spring song of a Northern Cardinal (familiar at least for those of us in the Eastern US.) Notice all of the harmonics, denoting a very rich song. The song starts with two very slow upslurs. It then continues with a very steady, fairly rapid sequence of rich tones. If you look closely you can see that each of the rapid notes has a prominent downslur.

 

 

The Ovenbird's song starts quietly and increases in volume. As you can see in this sonogram, it also increases in richness of tone. You can also see the two parts of each song element ("tea   cher.........tea    cher....)

 


COMPARING CHICKADEE CALLS

Now let's examine two different species' vocalizations and see how an AS might help us learn to differentiate them in the field.

Here's an AS of a Black-capped Chickadee's "phoebe" call.

 

There are several useful things to notice about this call.
First of all, the sound is very pure. There is a basic pitch and a couple of harmonics. We're pretty sure they are harmonics and not noise because they are evenly distribute above the fundamental pitch. Both notes of the two part call have the same level of "purity" since they contain about the same harmonic content. So they will sound similar in quality.


The first tone falls a bit in pitch, but not a lot. The second tone is lower than the first.
There's a clean break between the two notes, so they will sound distinct and separate.

Now let's take a look at a Carolina Chickadee's call. Although these calls are often confused in the field, they are actually very different. And this difference is quite evident when you look at the audio spectrograms.

 

 

The most obvious difference is that the CACH's call has four notes vs the BCCH's two. But let's look a bit deeper to see some more revealing differences, since the BCCH can double its call or the CACH truncate it's call.

Another striking difference between the two calls is the pitch difference between the first and second notes in the CACH's call. Whereas the BCCH's two notes were very close to the same pitch, 4.5kHz to 3.75kHz, there is a big pitch jump from the first note to the second note in the CACH: from 6kHz to 3.5kHz. And oddly enough, the second note of the CACH's call is lower than the notes of the BCCH! This can explain some confusion caused by field guides that describe the CACH as being a higher call than the BCCH. Indeed it starts higher, but the second notes are lower.

Take a look at the first two notes. Notice the differences in the harmonics between the first note, with only one harmonic, and the second note, with three harmonics. The first note is thin and pure sounding, the second more complex. Certainly the two notes do not sound as similar to each other as the BCCH's notes, which are basically the very same tonal quality.

Now look at the graphic area between the first two notes of the CACH's call. You can clearly see a line between the two notes that extends lower than the second note. Since the line indicates many different frequencies in the same very short period of time, this part of the call will not be a clear note, but rather some kind of noise. And it will sound a bit lower than the second note. Since it's short and noisy, then, it will sound a bit like a hiccup or glitch in the song. This glitch is very obvious when you listen to the CACH's song and is very different from the two pure, simple notes of the BCCH.

Finally, the length of the CACH's call is about 1.5 seconds, the same length as the two notes of the BCCH, which therefore sounds slower and more relaxed.

As you can see, an audio spectrogram can make it easier to "see" inside vocalizations and find the important differences between two species.


TRILLS AND NOISY CALLS


Trills usually consist of the same note or inflected note repeated many times in a very short period of time.
Here are four fairly similar bird songs that are trills. The audio spectrograms reveal some interesting points about each song.

Notice that both the Worm-eating and Pine Warblers start their songs softly and then quickly increase to full volume. The Pine Warbler's trill also has a lot of variability in pitch.

The Dark-eyed Junco's song shows two very distinct harmonics. This indicates that the song is much richer than the other trills. There also is a rich, very short note between each iteration of the trill, that would help in learning the song.

The Chipping Sparrow's song is the simplest, with less internal variation in each individual note and abrupt beginning and ending. Spending more time with these spectrograms will reveal some other differences as well.


As another example of trills, here's the song of a Savannah Sparrow.

 

Notice the song starts with some relatively clear slurs, and that the buzzes have some remants of harmonics and a distinct change of pitch. Since the Savannah Sparrow's trill shows much denser lines than the trills in the above warbler examples, it will sound much noisier. However it won't sound as noisy as the Seaside or Nelson's Sharptailed Sparrows in the example below.

 NOISY VOCALIZATIONS

I mentioned above that noise is the simultaneous sounding of many unrelated, close pitches. In an audio spectrogram it looks like a big block of dark color. If the graphic of the noise shows black from the very bottom to the top of the AS, then there will be no pitch content at all. Our ear will not hear the sound as having any pitch. However if the "blob" is concentrated in one part of the audio spectrogram, then the noise can sound high or low, especially in relation to other noises in the song.

Here are two songs that contain noise.

 

 

The noise at the end of the Seaside Sparrow's song is a very dense, fairly even "mass" in the AS. This indicates that the noise has very little pitch, but just sounds like broad spectrum noise.

In addition to the noise, it's interesting to notice that before the last large section of noise there are three "notes" that are broad, simultaneous sounding of many pitches near each other. That indicates this section won't sound like a clear tone, or a whistle.

But the noise isn't monolithic. Notice that there actually 4 different wide lines on top of each other, with space between them. This indicates the sound will have some of the characteristics of a tone with harmonics. So this section of the Seaside Sparrow's song will have a much more "musical" or rich sound for these three notes. And in fact this change of characteristic within the song is useful in separating the Seaside Sparrow from other sparrows that share their reedy habitat.


The Nelson's Sharp-tailed Sparrow's song has a less "broad" noise characteristic. The graphic for the first section of noise is concentrate in the lower 2/5ths of the spectrogram. This indicates that there will be some sense of pitch to the noise. It won't sound rich as there are no indications of harmonic content. However you will hear a definite change in pitch of the noise as the song progresses from the first section of noise to the second section.

 
ANALYZING CALL NOTES


Short call notes are the bain of many birders. Songs of many species can sound very similar, and many otherwise audio-oriented birders balk at the threshold of learning call notes. Here again, audio spectrograms can help out, at least in some cases.
Let's look at the notes of five different birds that can be found calling near each other during migration on the East Coast of the US.


A couple of initial ideas: Call notes can vary from being almost pure "pitched" noise to a whistle-like tone. As we have discussed, in an audio spectrogram, noise is indicated by a block of black. As mentioned above, if the block covers the whole audio spectrum, then there will be no indication of pitch. If the block is restricted to one small part of the audio spectrum, our ear will hear the referenced vocalization as having some pitch, maybe sounding basier or darker or higher or lighter than other birds or other parts of the same call. If there are harmonics, even harmonics that approach being noise themselves, as seen above, the vocalization will be richer than noise without harmonics.
With this in mind, let's look at the call notes of five warblers.

 

 

The calls of the Hooded Warbler and the Common Yellowthroat are basically pure noise. Not full spectrum noise, but there are no harmonics to add any richness to the call. Looking closer, we can see that the COYE's call is actually made up of several very fast iterations of noise with small intervening spaces. This call, then, will sound more like a very fast rattle than a pure monolithic sound. If you listen for this fast variation, the call becomes much easier to ID.

In contrast, the Hooded Warbler's call is very monolithic. Notice also that it is, relatively speaking, a very long call note, and trails off towards the end of the call note. In fact, the note is more than twice as long as the COYE's call note.

The Chipping Sparrow's call note is very short and simple. It's basically one very fast event. Notice that, even though it is the highest of the call notes we're discussing, it has one harmonic, showing that it is a fairly rich tone.

The Yellow and Magnolia Warblers both have much more pitched call notes. Both show harmonics and considerable variation in pitch within the note. The Yellow Warbler's call note ends lower than it starts and has a fast up and down movement, with most of the energy of the call in the downward movement.

The Magnolia Warbler's note is more gentle sounding, with most of the energy of the call at the highest point in the call and then a short falling off of the pitch.


Of course when discussing call notes, we're referring to very short vocalizations. The variations we've seen are taking place often in less than 1/10th of a second. Although the Hooded Warbler's call is twice as long as that of the Common Yellowthroat, the difference is only 1/20th of a second. These differences are difficult to pick up when hearing the birds in the field. However one of the contentions of this article is that if you study audio spectrograms, particularly of these difficult vocalizations, you can discover much more easily what you need to listen for in the field. And these discoveries will in fact help you become much better at ID'ing birds from their call notes. In other words, it IS possible to hear these differences. But it helps a lot to know what you are listening for!


ON TO THE THRASHERS

Now let's see how audio spectrograms can help us distinguish the differences between the songs of a difficult group of birds, the thrashers found in Southeast Arizona. We'll examine the songs considering the following criteria: the rhythm of the song, whethere there are stops or spaces in the song, the number of different song elements and how they vary, and the tonal or "pitched" range of the song.

These audio spectrograms are of the first 7 or so seconds of the thrasher songs found in the Stoke's Field Guide to Western Birds if you'd like to listen along.

The song of the Crissal Thrasher is a good place to start.

 

Notice first that the song is divided into very obvious sections with very visible short or longer pauses separating the sections. The sections all look fairly different from each other, including a two-part slow slur, a trill, and a three-part faster slur. There is also clear variations in pitch from section to section.
This song will sound divided into sections and will have a lot of variation.


Now let's contrast the Crissal's AS with the similar LeConte's Thrasher.

 

 

This song also has some long pauses, in fact even longer and more relaxed pauses than the Crissal's song. Now notice that the sections of the song all look much more similar to each other than the Crissal's repertoire, indicating less variation in the song.

Also, you can see a number of "spikes" or short long lines indicating chip-like fast notes. These chips seem to puntuate much of the song, unlike the Crissal which has a lot more variety made up mostly of slurs and trills. 
The rhythm of the LeConte's song is also fairly similar from section to section. There isn't nearly as much variation as you've seen in the Crissal's song.

Finally, the pitch of this song stays in the same general range. Although individual sections have slurs, the basic pitch from section to section is very similar. Again this is in contrast with the more variable Crissal's song.


The Bendire's Thrasher has a very constant, almost run on song indicated by a steady rhythm with virtually no pauses. This is very different than the previous two species, that have variable and fairly long pauses from section to section.

 

The Bendire's song shows elements that are often repeated three or four times, so the song will have some feel of repetition, however the sections are not set off from each other by pauses as they are in the Crissal.

Notice that the pitch of the elements fall in the same basic range, but that in many song elements thare is a very fairly wide range of harmonics. That indicates the song will be fairly "rich" sounding: not thin and not deep. However there won't be much of a sense of change in pitch. So the song will sound like a very rich, run on collection of repeated sections.


The Sage Thrasher also has a run on song with few pauses. Notice, however, that there are "pick up" chips, or very light single lines throughout the song. This is not present in any of the other thrasher songs so consistently.


Also, the pitch range is very limited and much lower, with fewer harmonics than the Bendire's. That indicates the song will seem to be of consistent tonal quality and pitch throughout, and will seem lower and quite a bit less rich than the Bendire's.


Finally let's consider the Curve-billed Thrasher.

 

 

Unlike the Bendire's and Sage Thrasher's songs, the song contains distinct pauses and much more variability in pitch. This suggests the Crissal's song. But notice that the pauses are very brief and infrequent, unlike the longer and more variable and relaxed pauses in the Crissal's song.
 
Also, the Curve-billed song contains many shorter, chip-like sounds as distinct elements within the vocalization. This makes the song sound a bit harsher and can be very easy to pick out in the field.
So the Curve-billed Thrasher's song will seem faster, with fewer and shorter pauses, and will have distinct chips that are song elements, not pick up chips as in the Sage Thrasher's song.


In summary, the audio spectrograms make it very clear that three species of thrasher have significant, regular pauses in their songs and two do not. This should make it very easy to distinguish, for example, a Bendire's from a Crissal Thrasher's song.
 
In the run on songs, the Bendire's song is much richer, contains more repeated elements, and is not punctuated by chips as is the more monotonic Sage Thrasher's song.

For the group with pauses, the Crissal is fairly relaxed, and contains a lot of variety and pitch change. The LeConte's has even longer pauses, but the elements are much more similar,are repeated more times, and are often punctuated by chips as part of the repeated song elements. The Curve-billed song is more "nervous" with fewer and shorter pauses. It contains chips that are their own elements, not part of a larger pattern. And there is much less repetition of song elements than than the more relaxed LeConte's and Crissal songs.

 

CONCLUSION


Hopefully this article has been able to demonstrate how useful audio spectrograms can be in aiding you in identifying difficult songs and calls. Of course any exercise that helps you focus more intently on a song will be beneficial. Audio spectrograms offer a unique opportunity for you to analyze and compare the rhythm, tonal quality, and overall form of bird vocalizations and, by seeing "inside" the song, become more familiar with the harder to hear elements. Once you are in the field you will then be able to focus more easily on these elements and identify the vocalizations of previously difficult to ID species.

 

Copyright 2009 Tom Stephenson

 

 

 

 

 

 

 

 

 

 

 

 

 

 


 
« Last Page :: Next Page »
Locations of
visitors to this page