By Douglas L. Beck, AuD, and Jennifer L. Repovsch, AuD

Speech audiometry has long been a standard part of the audiometric diagnostic test battery across the United States. However, for speech audiometry to be meaningful with regard to diagnostics, strict adherence to administration standards and protocols is important. Indeed, when speech discrimination tests (aka, word recognition tests) are presented in casual manners (monitored live voice, half lists, etc), the test is rendered relatively meaningless. Further, speech-in-noise (SIN) testing is quite possibly the single most important functional (ie, non-diagnostic) test we can perform. Specifically, SIN tests measure and represent the primary reason most patients seek counsel from hearing care professionals—difficulty understanding speech in noise. Importantly, SIN scores/ability cannot be predicted from pure-tone thresholds.

One could easily argue that SIN tests play a dual role in audiometry. First, as noted above, SIN objectively demonstrates the patient’s difficulty understanding speech in noise. Second, a simple re-test 30 days post-fitting (or post-auditory rehabilitation) can objectively demonstrate improvement in the specific SIN task.

In this article, we’ll explore the pros and cons associated with “speech discrimination” testing, as well as the advantages of SIN testing, and the integration of these protocols in modern audiometric equipment.

Statistical Significance in Word Recognition Score Tests

Lawson1 reported when using the standard 95% critical difference criteria (ie, the most common criteria applied in behavioral statistics), we employ a statistical model to state with 95% certainty that two scores are either the same or different. Specifically, given a patient with a symmetric mild-to-moderate sensorineural hearing loss (SNHL) and given a 25-word list from the most common word recognition tests (CID-W22, NU-6, etc), if the sum total of the first Word Recognition Score (WRS) is 88% correct, the range of scores, which are statistically the same as the first score (88%), is from 68% to 100%.

Of course, we can improve and tighten the range by using a 50-word list for each presentation; however, few clinicians use 50-word lists. Nonetheless, given a 50-word list, and assuming the first score was (again) 88%, the range of scores statistically the same as the first score (88%) would be from 74% to 96%.

This same phenomenon has been known and described for decades (see Thornton and Raffin, 19782), yet this same important and interpretative knowledge rarely makes it past the soundbooth door. Specifically, when given a WRS of 88% in the first ear (based on a 25-word presentation), if the other ear WRS is 72% or 96%, there is an excellent chance many clinicians will assume these scores are statistically different. They are not. The WRS test itself is not a very powerful test and the “clinical difference” noted is simply a fallacy. Indeed, if one were to repeat the same test in an hour, a day, or week, there is a reasonable chance the scores would vary—and they may even reverse!

Ears vs Brains

It has been hypothesized that WRS may relate more to the quantity of grey matter in the brain, rather than the hearing loss demonstrated in the ear. Harris et al3 reported that, even after controlling for hearing loss, structural differences (grey matter volume) within the auditory cortex (of older and younger people) predicted performance with regard to WRS. Specifically, those with less grey matter did less well than those with more grey matter, with regard to WRS.

Jerger4 noted, “It is not too soon to give serious consideration to extending the scope of speech audiometric measures by taking advantage of the insights of cognitive scientists and the capabilities of brain electrophysiology.” Jerger referred to Schneider and Pichora-Fuller5 who reported performing the WRS test integrates that which was heard, with past (word and vocabulary) knowledge, as well as storage and retrieval issues within our memories, the formulation of a reply, as well as the actual speech production act itself. That is, there is more to the WRS than one might casually suspect.

Presentation Level

Guthrie and Mackersie6 noted the best protocol (with regard to the loudness presentation for WRS testing) is to present multiple loudness levels to determine which presentation levels provide the maximal WRS. They reported that, if the WRS presentation level was simply tested at 40 dBSL (re: SRT), uncomfortable loudness levels (UCLs) would be exceeded for some two-thirds of all patients with SRTs of 35 dBHL or greater. Further, when using a 30 dBSL presentation (re: SRT), this too, would violate UCL for most patients with 55 dBHL SRT. Of note, Guthrie and Mackersie reported maximal phoneme recognition was generally determined to be 5 dB below UCL, rather than at MCL for people with moderate to severe SNHL. They reported “fixed level testing” (such as SRT + 30 or +40 dB) was not advocated.

Time Allotment

One suspected reason why hearing care professionals most often do not use recorded speech materials is the fictitious observation that employing recorded test materials (ie, employing clinical best practices) takes too much time. However, based on more than 500 patient responses, Kochkin7 conclusively stated when verification and validation measures are employed (ie, best practices), the result is 1.2 fewer office visits per patient (on average).

Mendel and Owen8 examined how long it actually takes to present recorded versus monitored live voice (MLV) WRS tests, using 50-word NU-6 lists. They evaluated three groups of listeners and determined, indeed, MLV testing was faster than using recorded speech materials. However, the average difference in administration time was 49 seconds (total time difference for 50-word presentations), which was not determined to represent a clinically significant difference. Further, they presumed that given a 25-word list, the difference in administration time between MLV and recorded tests would be less than 30 seconds.

Monitored Live Voice vs Recorded Stimuli: There’s No Contest

Stoppenbach et al9 reported the use of recorded (as opposed to live) speech stimuli has been advocated for more than half a century. They noted each person presenting a live WRS test actually presents a unique test. Of note, as they published their work at the beginning of the digital era, they reported many advantages of digital recordings (as opposed to analog), including quick access to the test material itself, deterioration resistance, improved fidelity, an excellent signal-to-noise ratio (SNR), a wider frequency response, and “near perfect” channel separation. Stoppenbach et al stated “the reason(s) for audiologists not using recording materials is unknown,” and they speculated the selection of live versus recorded may boil down to convenience and economy.

The senior author (DLB) states the day of MLV has come and gone. That is, one simply cannot reproduce vocal utterances exactly the same, time after time after time. Even if one were to perfectly observe and balance the VU meter every time (an impossible task!), we each take breaths, we each vary the length of the utterance, we each have an accent, some of us are men and some are women, we each have a different fundamental frequency and multiple harmonics that create our voice. The goal is not to describe how the patient responds to our voice, the goal is to describe the maximal ability of the patient to repeat standardized speech (and/or speech sounds). MLV is absolutely a non-defensible clinical protocol, and frankly, we simply know better!

Speech in Noise

It almost goes without saying that the single most-common complaint of all people with hearing loss and all people using traditional amplification systems is “clearly understanding speech in a background of noise.” Indeed, this complaint is common, pervasive, and is generally the most important issue to the patient.

Unfortunately, it is impossible to estimate or predict the performance level (and frustration) experienced by any person attempting to listen to speech in noise. There is no correlation between hearing thresholds and speech-in-noise ability, and therefore, one must test SIN in order to adequately understand and address the problem.

Of note, throughout 2012 and 2013, the senior author (DLB) has asked the question, “How many of you test every patient every time with regard to their SIN ability?” at dozens of local, state, and regional meetings of hearing care practitioners. The response in the affirmative generally approaches 5% to 10%. Beck10 recently asked, “Why…is it acceptable for the majority of our patients to complain about speech-in-noise, yet most of us don’t test (ie, measure) it?” Beck and Nilsson11 advocated SIN testing should be included in every diagnostic evaluation, as well as in every hearing aid evaluation. They stated the only way to truly understand the difficulty a specific patient has with regard to speech in noise is to measure it. They note audiograms and hearing thresholds simply do not predict SIN scores, and SIN scores cannot be predicted based on the type and degree of hearing loss.

Beck and Nilsson11 report a multitude of excellent, commercially available, and well-known SIN tests are available (HINT, WIN, QuickSIN, BKB-SIN, etc). Further, reliable and accurate “home-made” SIN tests are easily designed, tested, and normed. Although commercially created SIN tests are preferred, home-made tests are useful and demonstrative (see American Academy of Audiology interview with Barbara Weinstein12 and Beck and Nilsson11 for how you can create your own home-made SIN test).

Beck MaicoPicture1 opt
Figure 1. The 2013 MA 41 portable air-bone-speech audiometer by Maico Diagnostics is an example of an audiometer that provides embedded WRS and SIN testing.


Embedded WRS and SIN Tests

Modern audiometers offer advanced benefits—including software with embedded speech tests for WRS and SIN—providing quick and accurate diagnostic and pragmatic speech testing and challenges (respectively). Digital speech files are embedded into portable and larger audiometers, thus replacing internal and external tape recorders, CD players, and the need to calibrate signals before every test.

For example, the new MA 41 (Figure 1) portable audiometer by MAICO Diagnostics includes embedded American English speech files for children and adults, and Spanish speech files, and includes the QuickSin embedded SIN file. Embedding speech files into modern audiometers provides the accuracy of recorded speech materials in tandem with the speed of live-voice testing while response scores are automatically calculated and stored.


When WRS is evaluated in a scientific and controlled manner (see above), the result is an estimate of the maximum ability of a patient to understand a standardized word list in isolation in a quiet background. Pragmatically, when scientifically presented and interpreted, the WRS is a useful (although gross) measure of word recognition ability in quiet. WRS obtained in quiet represents a “best case scenario” with regard to recognizing words. The WRS score in quiet does not (nor was it designed to) represent a significant cognitive or acoustic “challenge” to the patient.

SIN tests are designed to challenge the patient as they present “real world” acoustic environments. They are designed to objectively document the ability (and difficulty) of the patient with respect to clearly understanding speech in difficult listening environments. Speech tests embedded into modern audiometers offer multiple advantages that facilitate repeatable, quick, accurate, and meaningful tests.

DougBeck 611 opt JenRepovsch opt
Douglas L. Beck, AuD, is Director of Professional Relations at Oticon Inc, Somerset, NJ, and Jennifer L. Repovsch, AuD, is an audiologist at Maico Diagnostics, Eden Prairie, Minn. Correspondence can be addressed to Dr Beck at: [email protected]



1. American Academy of Audiology. Speech audiometry, word recognition, and binomial variables: Interview with Gary Lawson, PhD. Available at:

2. Thornton AR, Raffin MM. Speech-discrimination scores modeled as a binomial variable. J Speech Hear Res. 1978;21[Sept]:507-518.

3. Harris KC, Dubno JR, Keren NI, Ahlstrom JB, Eckert MA. Speech recognition in younger and older adults: a dependency on low-level auditory cortex. J Neurosci. 2009;29(19):6078-6087. doi:10.1523/JNEUROSCI.0412-09.2009

4. Jerger J. Editorial: New horizons in speech audiometry? J Am Acad Audiol. 2010;21(7).

5. Schneider B, Pichora-Fuller K. Implications of perceptual deterioration for cognitive aging research. In: Craik F, Salthouse T, eds. The Handbook of Aging and Cognition. Mahwah, NJ: Lawrence Erlbaum Associates; 2000:155–219.

6. Guthrie LA, Mackersie CL. A comparison of presentation levels to maximize word recognition scores. J Am Acad Audiol. 2009;(20)6:381-390. Summary at:

7. Kochkin S. MarkeTrak VIII: Reducing patient visits through verification & validation. Hearing Review. 2011;18(6):10-12. /products/17112-marketrak-viii-reducing-patient-visits-through-verification-amp-validation

8. Mendel LL, Owen SR. A study of recorded versus live voice word recognition. Int J Audiol. 2011;50:688-693.

9. Stoppenbach DT, Craig JM, Wiley TL, Wilson RH. Word recognition performance for Northwestern University Auditory Test No. 6 word lists in quiet and in competing message. J Am Acad Audiol. 1999;10:429-435.

10. Beck DL. Reflections on change, fitting protocols, counseling, audiograms, and more! International Hearing Society Soundboard. April/May 2013.

11. Beck DL, Nilsson M. Speech-in-noise testing—A pragmatic addendum to hearing aid fittings. Hearing Review. 2013;20(5):24-26. Available at: /continuing-education/21662-speech-in-noise-testing-a-pragmatic-addendum-to-hearing-aid-fittings

12. American Academy of Audiology interview with Barbara Weinstein. Available at: