When comparing music to speech, there are four essential physical differences that warrant consideration during a hearing aid (or cochlear implant) fitting. Understanding these differences and selecting appropriate hearing aid circuitry will help to optimize the enjoyment of music for the hard-of-hearing listener. This applies equally to those who are musicians as to those who like to listen to (sometimes loud) music.
In most cases, regardless of the music program in the hearing aid, if the front end is distorting (because of musics more intense input), nothing implemented later in a hearing aid can improve the situation. Understanding and managing the four physical differences between speech and music will allow the hearing health care professional to approximate the electroacoustic parameters to provide maximum enjoyment of music for musicians and music-lovers.
Acoustic Structure of Music
Four primary physical differences between speech and music are:
- The long-term spectrum of music vs speech.
- Differing overall intensities.
- Crest factors.
- Phonetic vs phonemic perceptual requirements.
Music long-term spectrum vs speech-long term spectrum. It is understandable that speech has a relatively well-defined and uniform long-term speech spectrum regardless of the language spoken. All speech derives from the human vocal tract, which on average is 17 cm long, is closed acoustically at the vocal chords, and open acoustically either at the lips or, in the case of nasals, the nostrils (just like the tubing resonances in a behind the ear hearing aid). Sound resonances (formants) are generated by well-defined quarter wavelength resonators (eg, low vowels /a/), constrictions (eg, high vowels and fricatives), temporal stops (eg, affricates and stops), and nasals just to name a few common to all languages.1
While the study of these allophones can be complex, the resulting long-term speech spectrum is well defined and can be used in the development of a target for amplified speech. Speech tends to have its greatest intensity in the lower frequency ranges, with the higher frequency fricatives (eg, /s/, /f/, etc.) being of lower intensity. This is well understood and has been used to set parameters for gain, compression, and for hearing aid quality control testing.
In contrast, music can derive from many sources, such as the vocal tract, a percussive instrument (eg, drums), a woodwind or brass instrument with a dependence of both a quarter (eg, clarinet) and a one-half (eg, saxophone) wavelength resonator, and any number of instruments that are stringed with a rich harmonic structure of a half-wavelength resonator (eg, violin and guitar). These can be amplified or unamplified. Even if unamplified, depending on the music, it may be of low intensity or of high intensity. And, depending on the instrument, the music spectrum may be high- or low-frequency in emphasis.
Unlike speech, music is highly variable and a goal of a long-term music spectrum is poorly conceived; there is simply no music target as there is for amplified speech.
Differing overall intensities. At 1 meter, speech averages 65 dB SPL (root mean square or RMS), and has peaks and valleys of about 12-15 dB in magnitude. Rarely is shouted speech much greater than 90 dB SPL. We have quite good estimates of soft speech, medium speech, and loud speech, which are used to set modern (digital) hearing aids. Speech comes from the human vocal tract and similar human lungs impart similar subglottal pressures to drive the vocal chords. When considering the physical limitations of vocalization, the potential intensity range is quite restrictedapproximately 30-35 dB.
In contrast, depending on the music played or listened to, various instruments can generate very soft sounds (20-30 dB SPL [eg, brushes on a jazz drum]) to amplified guitar and even the brass of Wagners Ring Cycle (in excess of 120 dB SPL). Authors Note: I have no idea of the excruciatingly loud sound level of a piccolo because I have forbidden all piccolo players from coming to my office!
Therefore, the dynamic range of music as an input to a hearing aid is therefore on the order of 100 dB (versus only 30-35 dB for speech). Overall, when played at an average or mezzo forte level, classical music at the players ear can be typically 85-100 dB SPL. Rock and roll, on average, tends to be another 10-15 dB more intense, depending on the nature of the band.2
Crest factors. The crest factor is the difference in decibels between the peak (most intense part) of a waveform and its average or RMS. The RMS value corresponds with ones perception of loudnessthe subjective attribute correlating with intensity.
For speech, the RMS is about 65 dB with peaks extending about 12 dB higher. The crest factor for speech is therefore on the order of about 12 dB. This is well known in the hearing aid industry, and compression systems and hearing aid test boxes use this information. The reasons for the 12 dB crest factor for speech are many, but generally correspond to the damping or loss of energy that is inherent in the vocal tract of the speaker. Before a spoken word is heard, the vocal energy passes by a soft tongue, with soft cheeks and lips, and a nasal cavity full of soft tissue and occasionally some other foreign snotty materials. These soft tissues damp the sound such that the peaks are generally only 12 dB more intense than the average intensity of the speech.
In contrast, a trumpet has no soft walls or lips. The same can be said of most musical instruments and, as such, the peaks are less damped and peakier relative to the average than speech. Crest factors of 18-20 dB are not uncommon for many musical instruments.
Hearing instrument compression systems and detectors that are based on peak sound pressure levels may have different operating characteristics for music as input compared to speech. That is, music may cause some compression systems to enter its non-linear phase at a lower intensity than what would be appropriate for that individual.
Phonetic vs phonemic perceptual requirements. This refers to the difference between what is actually heardthe physical vibrations in the air (phonetic)as opposed to the perceptual needs or requirements of the individual or group of individuals (phonemic). These terms derive from the study of Linguistics, but have direct applicability here.
For all languages of the world, the long-term speech spectrum has most of its energy in the lower frequency region and less in the higher frequency region (its phonetic manifestation), but speech is largely derived from the mid- and high-frequency regions. This mismatch between energy (phonetic) and clarity (phonemic) is complex but well understood in the field of hearing aid amplification.
In contrast to speech, some musicians need to hear the lower frequency sounds more than others regardless of the output (phonetics) of the instrument. A clarinet player, for example, is typically satisfied with their tone only if the lower frequency inter-resonant breathiness is at a certain level, despite the fact that the clarinet can generate significant amounts of high-frequency energy. This is in sharp contrast to a violin player who needs to hear the magnitude of the higher frequency harmonics before they judge it to be a good sound. The clarinet and the violin both have similar energy spectra (similar phonetics) but dramatically differing uses of the sound (phonemics).
To improve the clarity of speech one typically enhances the higher frequency gain and output for a person with presbycusis, and while this also would be ideal for a violinist, it would be detrimental for a woodwind player such as a clarinetist. The output energy/perceptual requirements (phonetic/phonemic) mismatch for musicians throws many land mines in the path of selecting the correct gain and output parameters for listening to music.
These four differences between the physical properties of speech and music can now serve as the basis for differing electroacoustic settings of a hearing aid for speech versus music as inputs.
Peak Input Limiting Level in Hearing Aids: The Front End
Paying attention to the peak input levels at the front end of hearing aid processing is perhaps the most important of all factors in selecting a set of electroacoustic parameters that are optimal or near optimal for listening to amplified music through a hearing aid. Some hearing aids on the market have a limiter or clipper that prevents sounds above 85-90 dB SPL from effectively getting through the hearing aid. This is gradually changing in the industry, however, and many modern hearing aids have peak input limiting levels that are on the order of 100-105 dB SPL.
Historically this has been quite reasonable, because the most intense components of shouted speech is on the order of 85-90 dB SPL. In addition, manufacturers of digital hearing aids want to ensure that the analog to digital (A/D) converter is not overdriven. Anything more intense is not speech (or speech like), and as such, this limiter functions as a rudimentary noise reduction system.
However, music is generally much more intense than 85-90 dB SPL, and as such, is limited or distorted at the front end of the hearing aid. Modern hearing aid microphones can certainly handle up to 115 dB SPL without appreciable distortion so there is no inherent reason (other than sticking to history and limiting an input to a poorly configured A/D converter) for having an input-limiting level set so low.
Once intense inputs are limited and distorted at the front end of the hearing aid, regardless of the music program that occurs later on, the music will never be clear and of high-fidelity. There are available techniques to avoid this front end distortion problem, and depending on the implementation, may use a compressor to sneak under the peak limiter (with expansion after the limiter point in the hearing aid). Hearing aids are also available with a very high peak input limiting level. A good metaphor is a plane flying under a low bridge. If the bridge is too low, the plane will crash and only debris (distorted music) can get through. Either the bridge should be raised (the peak input limiting level is increased) or the plane needs to fly lower (the input to the hearing aid is lowered). An excellent demonstration of this phenomenon can be found at www.randomizedtimer.net/music or in the links section of the Musicians Clinics of Canada Web site (www.musiciansclinics.com under Marshall Chasins PowerPoint lectures).
Research has shown that anything below a peak input limiting level of 105 dB SPL will cause a deleterious distortion for music regardless of what program(s) come later in the hearing aid.3,4 A quick and dirty clinical test of a hearing aid to determine if its front end clips or distorts loud music is to set the output high (>115 dB) and the gain low (5-8 dB). In a hearing aid test box, apply an intense signal (eg, 100 dB SPL) and there should not be any peak clipping since the output is set above 115 dB. If there is a high-level of distortion (>10%), then the culprit is most likely a peak input limiting level that is too low to handle (intense) music.
An example of a hearing aid that has a very high peak input limiter is Etymotic Researchs K-AMP. Another, depending on its implementation, is the new Venture digital platform from Gennum Corporation (in fact, an earlier version of this product served as the basis for Etymotics Digi-K). The peak input limiting level is not required to be printed on a hearing aid specification sheet, so you need to contact the representative of the hearing aid manufacturer for specific details pertaining to this.
If a client has a hearing aid with a peak input limiting level that is too low for music, one strategy would be to turn down the input (eg, home stereo or MP3 system) and turn up the gain of the hearing aid. Using our previous analogy, this is like letting the plane fly under the bridge. Another approach simply uses a resistive network just after the hearing aid microphone that fools the hearing aid into thinking that the input is 10-15 dB less intense. Typically this resistive network is only engaged (with a button) for the music program. Other techniques that are not as elegant may include placing a band-aid like cover over the hearing aid microphone(s) to fool the hearing aid into thinking that the input is lower (because of the attenuation of the band-aid). The gain may or may not need to be increased to compensate since music is generally more intense than speech.
One Channel is Best for Music
In sharp contrast to hearing speech (especially in noise), one channelor, equivalently, many channels with the same compression ratios and kneepointsappears to be the appropriate choice for hearing music. This recommendation derives from clinical work with hundreds of hard-of-hearing musicians over the past 20 years.
Unlike speech, for most types of music the relative balance between the lower frequency fundamental energy and the higher frequency harmonics is crucial. High fidelity music is related to many parameters, one of which is the audibility of the higher frequency harmonics at the correct amplitude. Poor fidelity can result from the intensity of these harmonics being too low or too high. A multi-channel hearing aid that uses differing activation points and differing degrees of compression for various channels runs the distinct risk of severely altering this important low-frequency (fundamental)/high-frequency (harmonic) balance. Subsequently a music program within a hearing aid should be one channel or equivalently a multi-channel system where all compression parameters are set in a similar fashion. It has been suggested that, in some bass heavy situations, a two-channel system may be useful with the lower frequency channel set at 500 Hz with greater attenuation at higher input levels (L. Revit, personal communication, 2004).
Compression. The clinical rules of thumb for setting compression parameters for speech are rather straightforward. The compression detectors are set based on the crest factor of speech which is on the order of 12 dB (eg, as discussed earlier, the peaks are 12 dB more intense than the RMS). For speech, compression systems function to limit overly intense outputs and to ensure that soft sounds are heard as soft sounds, medium sounds are heard as medium sounds, and intense sounds are heard as intense (but not uncomfortable) sounds. In short, these systems take the dynamic range of speech (30-35 dB) and alter it to correspond with the dynamic range of the hard-of-hearing person. And, there is no inherent reason why a wide dynamic range compression (WDRC) system that works well for a client with speech as input, should not also work well for music. However, the dynamic range of music is typically much greater than that of speechtypically being on the order of 80-100 dB.
Having acknowledged this fact, it turns out that, clinically speaking, no major changes are required because the more intense components of music are found in a different part of the input-output curve of the compression function. The difference resides in whether the compression system uses a peak detector or an RMS detector. If the compressor uses an RMS (or average intensity) detector, then no changes need to be made for a music program. However, if the hearing aid utilizes a peak detector to activate the compression circuit, the detector in a music program should be set about 5-8 dB higher than for speech. This is related to the larger crest factor of music (18 dB vs 12 dB for speech), and care should be taken that these peaks do not activate the compression circuit prematurely.
Feedback reduction systems. In most cases, since the spectral intensity of music is greater than that for speech, feedback is not an issue. The gain of the hearing aid for these higher-level inputs is typically less than that for speech. However, if feedback reduction is required, or the feedback circuit cannot be disabled in a music program (as it can be for example, in the Bernafon ICOS) then those systems that utilize a gain reduction method (eg, Phonak Perseo or Widex Diva, although the Widex Diva only uses this approach for the music program) would be the best for music.
It is not so much that gain reduction is the proper approach, but that the other two approaches can have problems; namely notch filtering and phase cancellation. Depending on the implementation of the notch filtering, the center frequency of the filter may hop around searching for the feedback, thereby causing a distortion (blurry sound). Although this artifact has been reported in the literature,5 I have never experienced this frequency-hopping artifact in the clinic. The phase cancellation approach uses a technique where a signal is generated that is 180° out of phase with the feedback. Although this works well for speech and the majority of hearing aid manufacturers use such a technique, the narrow bandwidth harmonics of music (since there is minimal damping) can, and do, confuse the hearing aid into suppressing the music. In addition, if the harmonic is of short duration, the created cancellation signal can become audible and is heard as a brief chirp. Two approaches have been used to remediate this. One is to limit the feedback detector to the very high frequency range where the musical harmonic structure in inherently less intense (eg, Oticon Syncro and Siemens Triano) or to use a two-stage phase cancellation technique using both fast and slow attack times (eg. Siemens Acuris and Bernafon Symbio). However, if at all possible, disabling any feedback reduction system would be the optimal approach for listening or playing music.
Noise reduction systems. Like feedback reduction systems, it would be best to disable the noise reduction system when listening to music. Typically, the signal to noise ratio (SNR) is quite favorable when listening to music, so noise reduction is unnecessary. However, for some hearing aids, the noise reduction system cannot be disabled, and since the primary benefit for noise reduction systems seems to be for improving listening comfort rather than reducing noise, choosing an approach for music that has minimal effect may be beneficial for a music program.
TABLE 2. The different median modulation rates and depths for speech, instrumental music, and for environmental noise. Use of these differences may be useful for hearing aid circuitry to automatically distinguish between these inputs to a hearing aid.
Most noise reduction systems in use today use a form of modulation detection. The modulation is simply the changing from quiet to more intense portions in a wave form. The rate at which it does this each second is called the modulation rate. For speech, it is roughly the number of times that one opens and closes ones mouth every second and has most of its response in the 4-6 Hz region. The modulation depth is the difference in decibels between the quietest and most intense elements. For speech this is typically 30-35 dB (Table 2). For noise the modulation rate is typically very low (<2 Hz, if not zero) and the related modulation depth is also very low (<15 dB). In contrast, for music, the modulation rate can be as high as 100 Hz with a modulation depth of 60 dB.
Understandably, modulation rate and depth has been primarily used to distinguish between speech and noise. Its usefulness has been more for comfort than for improvement of SNR; however, it may have great ramifications for automatically distinguishing music from speech (or noise). Table 2 also shows the modulation rate and depth for music, and as can be seen, this innovation that is currently widespread in the hearing aid industry can be used to automatically set a music program with all of the parameters set to optimize listening and playing for music.
There is no inherent reason why someone needs to push a button to listen to music. It should be as automatic as a normal person not having to push a button to listen to music. The technology is available to have a hearing aid automatically identify something as music or music-like and engage the music program without the hearing aid user having to do anything themselves.
Defining The Music Program
In conclusion, a music program, or a set of optimal electroacoustic parameters for enjoying music would include:
1) A sufficiently high peak input limiting level so more intense components of music are not distorted at the front end of the hearing aid.
2) Either a single channel or a multichannel system in which all channels are set for similar compression ratios and kneepoints.
3) A compression system similar to the speech-based compression system with an RMS detector compression scheme and with a kneepoint 5-8 dB higher if the hearing aid uses a peak compression detector.
4) A disabled feedback reduction system, or a feedback reduction system that uses gain reduction or a more sophisticated form of phase feedback cancellation (either one with short and long attack times or one that only operates on a restricted range of frequencies such as over 2000 Hz).
5) If the noise reduction system cannot be disabled, a circuit that distinguishes between low (4-6 Hz) and high (10-100 Hz) modulation rates may be useful for differentiating speech from music, and automatically turning on a music program.
|How Loud Is that Musical Instrument?
These measurements were taken by the author over the past 20 years and represent the approximate sound levels typically generated by musicians. These data are from a wide range of music types so, in some cases, the levels have a large range. (Note: Any errors in these data are entirely my own, and I am the first to admit that they are not artifact-free.) Most of the measurements were taken at 3 meters (10 feet) on the playing plane of the musician. In some cases (eg, violin) measurements were also taken on the left shoulder. Whenever the measurement was not taken at 3 meters, it is indicated in the chart.
Most hearing conservation programs, and many hearing conservation regulations in the United States and abroad, use dBA as an important measurement. The dBA or A-weighted decibel measurement is an attempt to simulate how the human ear responds to sound levels. The A-weighted scale filters out the lower frequency sounds in much the same way that our auditory system does. Low frequency sounds need to be at a higher sound pressure level (SPL) before being heard by normal-hearing people (eg, the A-weighted scale). Prolonged exposure to levels of 85 dBA or greater result in a permanent hearing loss. It should be pointed out that, even with prolonged exposure to 85 dBA, permanent hearing loss will occur. (For more information, see the article by Patty Niquette in this issue of HR).
As a point of interest, the peak or maximum SPL is also provided in the above table. Unlike the dBA measurement, the peak SPL has little predictive power for hearing loss unless it is very high. In cases of percussive blasts (eg, a gun or a drum rim shot), single intense exposures may cause a permanent hearing loss. This is called acoustic trauma, but is relatively rare as compared to the long-term noise or music hearing loss that occurs over many years.
|Marshall Chasin, AuD, is the audiologist and director of research at the Musicians Clinics of Canada, Toronto, and author of several books, including Musicians and the Prevention of Hearing Loss (Singular Publishing).|
Correspondence can be addressed to HR or Marshall Chasin, AuD, Musicians Clinics of Canada, #340-440 College St, Toronto, Ontario, Canada, M5T 3A9; email: [email protected].
1. Kent RD, Read C. Acoustic Analysis of Speech. 2nd Ed. New York: Delmar; 2002.
2. Chasin, M. Musicians and the Prevention of Hearing Loss. San Diego: Singular Publishing Group; 1996.
3. Chasin M. Music and hearing aids. Hear Jour. 2003;56(7):36-41.
4. Chasin M, Russo FA. Hearing Aids and Music. Trends in Amplif. 2004;8(4):35-47.
5. Chung K. Challenges and recent developments in hearing aids: Part 1. Speech understanding in noise, microphone technologies and noise reduction algorithms. Trends in Amplif. 2004;8(3):83-124.