One aspect of compression that has received less attention over the years is timing parameters. A new compression system by Oticon called Speech Guard monitors and responds to high-level inputs while at the same time monitors and responds to the ongoing speech signal using the best elements of fast and slow acting compression.

For the past two decades, amplification for patients with sensorineural hearing loss (SNHL) has been driven by the concept of wide dynamic range compression (WDRC). It has been known for many years that a core characteristic of SNHL is the reduction in dynamic range. The amount of “working space” within the auditory system (the range between threshold and the uncomfortable loudness level, or UCL) is typically smaller than the full range of speech signals that a person is likely to encounter throughout the course of the day.1

The WDRC approach was developed to take a full range of speech inputs—the softest parts of soft speech through the loudest parts of loud speech—and place them within the remaining dynamic range of the patient. Over the years, a variety of schemes have been developed to calculate the appropriate gain required for different input levels in order to achieve the goal of full audibility. Most of the attention in this effort has been paid to determining aspects such as how can audibility be maximized without having sound levels violate the patient’s loudness tolerance, what is the minimal amount of audibility required for understanding of an on-going signal, and which frequency regions should be prioritized, etc.

One aspect that has received less attention is the timing parameters of a compression system. The basic concept of compression is that the gain applied to the signal is inversely proportional to the input level: when the input level goes up, the gain decreases; when the input level drops again, the gain goes back up. However, the response of compression systems is typically not instantaneous. Typical input signals, especially speech, are not of a uniform level. Therefore, “waiting periods”—commonly known as attack and release times—are often built into the response patterns of nonlinear circuitry.

Donald J. Schum, PhD, and Ravi Sockalingam, PhD This article was submitted to HR by Donald J. Schum, PhD,, vice president of audiology and professional relations at Oticon Inc, Somerset, NJ, and Ravi Sockalingam, PhD, senior audiologist at Oticon AS in Smørum, Denmark. Correspondence can be addressed to HR or Dr. Schum at .

In conversational speech, there can be 10 phonemes or more every second, each with a different input level. Additionally, some phonemes have significant level changes even within the core structure of the individual speech sound.2,3 Attack and release times manage just how sensitive the compression circuit is to these very short-term changes in level.

Specifically, the release time of a nonlinear circuit will fundamentally affect the intensity relationship of one speech sound to the next. If the release time is in the range of approximately 25 to 75 ms, it is considered to be a fast release time, and the goal is to readjust the gain for every individual phoneme. More intense vowels will receive less gain. However, if a less-intense unvoiced consonant follows, the gain is immediately increased.

Take, for example, the word “flash.” There are four phonemes, each at a different natural level. The initial “f” will be the least intense, followed by the moderate level “l”, then the intense “a,” and finally the moderately weak “sh.” A fast acting system would adjust the gain for each of those four phonemes.

If the release time of a compression system is in the range of approximately 200 to 2000 ms, it is considered to be slow acting. The goal of a slow acting system is to keep the operating gain level of the system stable for longer periods of time. If the gain is driven down by an intense vowel, then the gain will stay at that level (will not “release”) until 1000 ms later no matter how soft the following phoneme.

Since stressed vowels occur in ongoing speech 2 to 3 times per second and last 200 ms or more, there will not be an opportunity for a slow acting system to release and increase gain for the softer intervening phonemes. Basically, the series of stressed vowels occurring several times per second keeps the gain of the system at a stable level. This stability remains even during the gaps between words, phrases, and sentences. Only after the talker pauses will the gain change.

Fast Versus Slow Acting WDRC

For many years, it was assumed that, if a hearing aid used WDRC, the circuitry would and should operate as a fast acting system. It seemed logical that, because patients with sensorineural hearing loss present with a reduced dynamic range, less intense phonemes should receive more amplification than more intense phonemes.

It is true that the primary goal of WDRC is to provide greater gain for soft speech than for moderate and louder speech. However, there is an issue as to whether this goal should apply for speech as a long-term signal or for each individual phoneme.

Advocates of slower acting compression systems4 argue that information is imbedded in the natural intensity differences between one phoneme and the next. Unvoiced fricatives and stops are supposed to be considerably less intense than stressed vowels. Faster acting systems will alter the natural intensity structure of the speech signal.5 A slower acting WDRC system will preserve these natural phoneme-to-phoneme intensity differences, whereas a faster acting system will destroy this information.

It is important to point out that we use WDRC in hearing aids to improve audibility, not to inherently improve the intelligibility of speech. There is no evidence that the speech signal is ever made more intelligible by compression. Rather, any improvements in speech intelligibility are due to the signal becoming more audible for the patient. Any approach that can make the signal more audible for the patient without changing the natural structure is generally considered to be preferred.

Further, compression systems cannot tell if sound is speech or not. Compression systems only respond to the level of the signal. During ongoing speech, there are many occurrences in which there is a pause between words, phrases, or sentences. Some speech sounds (stops and affricates) also have a silent period as an inherent part of the production. A fast acting system, because it detects a drop in input levels during these pauses in speech, will turn up the gain rapidly. If there is truly no speech at that moment, whatever ambient sound is present in the environment will now be significantly louder.

This effect becomes particularly apparent if speech is being produced against a background of relatively stable competing noise. During the lower intensity periods of speech, the background noise will tend to “balloon up” and become quite noticeable. A natural, stable, unobtrusive background competition now becomes unstable and significantly more apparent.

Several investigations6-9 have established that hearing-impaired listeners tend to prefer the sound quality of slower acting WDRC systems compared to faster acting systems. The differences become particularly apparent in noisy situations. Speech recognition performance with linear or slower acting systems has been documented to be as good as or better than that with faster acting implementations.10-14

The discussion so far has focused on how hearing aids process the speech signal. Of course, speech is not the only signal that the hearing aid must manage. Amplification also has to provide protection against loudness discomfort, especially for rapid onset signals. These are the types of signals that demonstrate the potential limitation of slower acting compression.

If a hearing aid user is listening to speech and a signal occurs suddenly that jumps up above the ongoing level of speech (for example, a cough nearby or a dog bark), that high level input will drive the gain of the hearing aid down. Since the release time is set to be long, the gain will not reset to a higher level appropriate for the ongoing speech signal for perhaps a second or two. During that time, the level of the speech will drop appreciably such that the user will perceive a “drop out” in the speech signal.15

Thus, in a slow acting system, the hearing aid reacts quickly to extraneous sound in order to protect the user from discomfort. Because the hearing aid has no ability to recognize this signal as something different than the ongoing speech signal, it responds as expected: with a significant drop in gain that remains for a long period of time after the intrusive sound goes away.

A New Compression Solution

As originally described by Simonsen and Behrens,16 Speech Guard in Oticon Agil was designed to circumvent the compromises inherent in both slower acting and faster acting compression systems. Speech Guard implements a pair of analyzers (Figure 1), and the interaction between these two estimators is the key to the success of the system.

FIGURE 1
FIGURE 1. The structure of the Speech Guard processor, showing the two independent monitoring systems.

Up until now, the input signal has always been analyzed by a single system—usually using a relatively short analysis window. The level of the input was constantly monitored and that measured level determined the gain applied by the hearing aid circuitry.

The unique design feature of Speech Guard is that it implements two monitoring systems in parallel:

  • One analyzer uses a long averaging window designed to provide an ongoing estimate of the level of the environment. If a stable speech signal is present, this estimator tracks the ongoing overall level of the talker, and adjusts the gain of the amplifier slowly.
  • A second analyzer uses a very short averaging window. This provides information as to the instantaneous input level, as opposed to the overall ongoing level.
  • The levels of the two estimators are compared on a continuous basis. If the two levels are similar, then the input level estimated by the long-term analyzer determines the output of the hearing aid amplifier. If the level of the short-term analyzer is significantly different than the ongoing long-term average, then the level estimate from the short-term analyzer determines the output of the hearing aid amplifier.
  • The amplifier will normally operate with very long time constants, effectively processing the speech signal by making long-term overall volume control adjustments, but treating the signal as a linear amplifier on a short-term phoneme-to-phoneme basis. The gain of the amplifier will be determined by the overall level estimate of the long-term average.
  • If the short-term analyzer detects a signal that is significantly different in level than the long-term estimate, then the hearing aid amplifier transitions to a system with extremely fast time constants. The effect is to immediately respond to a sudden, intrusive sound that jumps in above the level of the ongoing speech signal. The gain is reduced almost instantaneously. However, once the sudden sonic event passes, the gain quickly returns to the level called for by the long-term average with longer time constants.

The Effect on the Speech Signal

FIGURE 2
FIGURE 2. Unaided speech presented at three levels: 70, 80, and then 60 dBSPL unaided (top panel), then through the Fast Acting Syllabic Compressor (middle panel), and through Speech Guard (bottom panel).
FIGURE 3
FIGURE 3. The level histogram of the sample of speech presented at 70 dBSPL, demonstrating that the majority of the speech signal falls within a 30 dB range, as has long been observed in the literature.

How does Speech Guard actually affect the speech signal? In the following examples, different speech signals were presented to Agil and to another Oticon hearing aid (Vigo Pro) that implements fast acting syllabic compression.

All recordings were done with KEMAR wearing the hearing aids. The hearing aids were programmed for a flat, moderate hearing loss and fit to KEMAR using closed earmolds. Agil implements the Voiced Aligned Compression fitting rationale and the Vigo Pro would normally implement the NAL-R fitting rationale. However, for the purposes of these recordings, the response of the Vigo Pro devices above 2000 Hz was adjusted to provide similar gain and compression as the Agil in the higher frequencies.

Figure 2 shows the response of the two processing systems to speech that changes level. The top panel provides the unaided signal. Speech was presented first at 70 dBSPL, followed by a section at 80 dBSPL, and then by a section at 60 dBSPL. The middle panel shows the output of the Fast Acting compressor, and the bottom panel shows the output of Speech Guard processing.

First, let’s look at the effect of Fast Acting Syllabic Compression in the middle panel. Long-term averages for the output signals were calculated for each of the three sections of speech input. Compared to the output for the 70 dB input, the average output for the 80 dB input section increased by 4 dB, which is consistent with the expected effect of Fast Acting Syllabic Compression. When the input level dropped from 80 to 60 dB, the output dropped by 9 dB, again as expected. Now turning attention to the Speech Guard effects in the bottom panel, the 70 to 80 dB change in input led to a 5 dB increase in output. The drop from 80 to 60 dB input led to an 11 dB drop in output. The data in the bottom panel demonstrates how Speech Guard adjusts the long-term overall gain applied to an ongoing speech signal in a manner consistent with the expected effects of slow acting automatic volume control systems. Although the effect on the average long-term output was similar for Speech Guard and for Fast Acting Syllabic Compression, the effects on a shorter-term basis were significantly different.

One way to see the differences is to compute histograms of the moment-to-moment levels. A histogram provides a count of the number of times a particular SPL occurs in a longer-term sample of sound. In this case, the signals were sampled every 2 ms. Figure 3 provides the histogram of the SPLs for the 70 dBSPL section of the input signal. The y-axis provides the number of times a particular level occurs, and the x-axis provides the level (measured as the number of dB down from the peak of the sample). The red bar shows that, for a speech signal that has not been manipulated by a hearing aid, most of the samples occur within a 30 dB range. This is consistent with the long-held observation that the dynamic range of the speech signal is approximately 30 dB.

Figure 4 provides the histograms of the output of the Fast Acting processing for 70 (left panel), 80 (middle panel), and 60 dB (right panel) inputs. The red bars now show that the range of outputs has dropped and that the 30 dB range of inputs now covers less than 20 dB in the output range. This effect is entirely consistent with the expected effects of Fast Acting processing. The less intense portions of the input signal receive more gain than the more intense sections.

FIGURE 4
FIGURE 4. The level histograms for 70 (left panel), 80 (middle panel), and 60 dBSPL (right panel) as processed by the Fast Acting Syllabic Compressor. The 30 dB range of inputs now covers less than a 20 dB output range.

Figure 5 provides the same data, but now for Speech Guard processing. Notice that almost the entire 30 dB range of inputs has been preserved in the output of the hearing aid. When the average is computed over the long-term speech segments, the output changes in a similar manner as with the fast acting processing; however, when measured using a very short time window, the complete dynamic range of the speech signal is preserved.

FIGURE 5
FIGURE 5. The level histograms for 70 (left panel), 80 (middle panel), and 60 dBSPL (right panel) as processed by Speech Guard. Almost the entire 30 dB of input range is preserved.

The speech signal—as long as the overall level remains the same—is processed in a linear fashion. However, as the long-term level changes from 70 to 80 dBSPL and then drops to 60 dBSPL, Speech Guard adjusts the long-term gain applied to properly scale the speech signal to the patient’s residual dynamic range.

As also documented by Simonsen and Behrens,16 the system behaves as a linear amplifier for the short term as long as the speech signal remains stable, but makes appropriate nonlinear adjustments for long-term changes in the overall input level.

FIGURE 6
FIGURE 6. Speech with three, high peak energy interruptions presented unaided (top panel) and through Speech Guard (bottom panel).

Handling of Extraneous Sounds

The effects outlined above would have been similar with a traditional WDRC hearing aid with a long release time. However, as discussed, slow acting systems have had problems handling higher-level, abrupt-onset extraneous sounds. These intrusive sounds have had the effect of disrupting the processing of the speech signal. The uniqueness of the Speech Guard system is that it can handle these intrusions without disrupting the intended, floating linear treatment of speech.

Figure 6 shows a recording of speech that was interrupted three times by rapid onset signals (coughing, alarm clock, and a doorbell) that peaked up to 15 dB above the ongoing speech signal. The upper panel provides the input signal, and the lower panel provides the output of Speech Guard. Notice that the intrusive sounds now peak at levels closer to the ongoing speech signal. This demonstrates that these signals were identified and scaled down to a level that was consistent with the non-linear behavior of the compression strategy.

Figure 7 is the 6-second section of Figure 6 that corresponds to the doorbell, and illustrates how fast the response of Speech Guard is to a high-level, sudden-onset signal. The onset of the doorbell peaked at nearly 15 dB above the ongoing speech signal. The call-out in this figure focuses on the 50 ms around the onset of the doorbell. Notice that the first wave-front of the doorbell is at a high level, but that the rapid response portion of the Speech Guard sound monitoring system identifies the level and reduces the gain before the second wave-front occurs (less than 2 ms).

FIGURE 7
FIGURE 7. The response of Speech Guard to the doorbell sound, with the callout focusing on the onset of the interrupting signal.

Final Comments

Designing a compression system to effectively manage louder extraneous sounds is not a new concept. Advanced hearing aids have always been able to protect the patient against excessive loud and potentially damaging inputs. Over the years, systems have improved in managing these louder inputs and still preserving sound quality. The limitation has been that the actions of these overload protection systems were tied together with the general processing of all signals that enter the hearing aid.

The unique feature of Speech Guard is the ability to monitor and respond to high level inputs while at the same time monitoring and responding to the ongoing speech signal. The essentially instantaneous identification of inputs that jump above the level of the averaged ongoing signal, the immediate reduction in gain while that intrusive sound is present, and the immediate return to the ongoing gain level are why this processing system is called “Speech Guard”: it is designed to allow us to process the speech signal exactly the way it should.

Speech is a complex acoustic reflection of the precise movements of the articulators of the mouth. The details matter to the listener. Precise changes in frequency, amplitude, and signal type over time units measured in milliseconds are what differentiate one word from the next. The patient with sensorineural hearing loss already has enough difficulty in processing small acoustic variations; altering this source of information makes the signal that much more difficult for the brain to interpret speech.

When we say that Speech Guard allows us to process the speech signal exactly the way it should, we mean that the system processes the speech signal with as little alteration as possible. As pointed out by Arehart, Kates, and Anderson,17 essentially any signal processing beyond linear amplification has the potential to negatively affect sound quality.

With Speech Guard, the signal receives frequency shaping to reflect the shape of the hearing loss and long-term gain sufficient to make it audible, but that is it. Its goal is to present the broadest bandwidth, most undisturbed signal as possible to the patient’s cognitive system.

Compared to more traditional nonlinear processing, Speech Guard has been demonstrated to provide significantly improved speech understanding in noise and significantly reduced listening effort (R. Sockalingam and M. Holmberg; Oticon white paper, 2010). In challenging listening situations, the subjects in this study benefited from a signal that more closely matched speech in its natural form.

References

  1. Pearsons KS, Bennett RL, Fidell S. Speech levels in various noise environments. Environmental Health Effects Research Series. EPA-600/1-77-025. Washington, DC: Environmental Protection Agency; 1977.
  2. Pickett J. The Sounds of Speech Communication. Baltimore: University Park Press; 1980.
  3. Kent R, Read C. The Acoustic Analysis of Speech. San Diego: Singular Publishing Group Inc; 1992.
  4. Plomp R. The negative effect of amplitude compression in multichannel hearing aids in light of the modulation-transfer function. J Acoust Soc Am. 1988;83:2322-2327.
  5. Stone M, Moore B. Quantifying the effects of fast-acting compression on the envelope of speech. J Acoust Soc Am. 2007;121:1654-1664.
  6. Neuman A, Bakke M, Mackersie C, Hellman S, Levitt H. The effect of release time in compression hearing aids: paired-comparison judgments of quality. J Acoust Soc Am. 1995;98:3182-3187.
  7. Neuman A, Bakke M, Mackersie C, Hellman S, Levitt H. The effect of compression ratio and release time on the categorical rating of sound quality. J Acoust Soc Am. 1998;103:2273-2281.
  8. Van Buuren R, Festen J, Houtgast T. Compression and expansion of the temporal envelope: evaluation of speech intelligibility and sound quality. J Acoust Soc Am. 1999;105:2903-2913.
  9. Boike K, Souza P. Effects of compression ratio on speech recognition and sound quality ratings with wide dynamic range compression. J Sp Lang Hear Res. 2000; 43:456-468.
  10. Hansen M. Effects of multi-channel compression time constants on subjectively perceived sound quality and speech intelligibility. Ear Hear. 2002;23:369-380.
  11. Hickson L, Thyer N. Acoustic analysis of speech through a hearing aid: perceptual effects of changes with two-channel compression. J Am Acad Audiol. 2003;14:414-426.
  12. Jenstad L, Souza P. Quantifying the effect of compression hearing aid release time on speech acoustics and intelligibility. J Sp Lang Hear Res. 2005;48:651-667.
  13. Souza P, Turner C. Multichannel compression, temporal cues, and audibility. J Sp Lang Hear Res. 1998;41:315-326.
  14. Stone M, Moore B. Side effects of fast-acting dynamic range compression that affect intelligibility in a competing speech task. J Acoust Soc Am. 2003;116:2311-2324.
  15. Souza P. Effects of compression on speech acoustics, intelligibility, and sound quality. Trends Amplif. 2002;6:131-165.
  16. Simonsen C, Behrens T. A new compression strategy based on a guided level estimator. Hearing Review. 2009;16(13):26-31.
  17. Arehart K, Kates J, Anderson M. Effects of noise, nonlinear processing, and linear filtering on perceived speech quality. Ear Hear. 2010;31:420-436.

Citation for this article:

Schum DJ, Sockalingam R. A new approach to nonlinear signal processing. Hearing Review. 2010;17(7):24-32.