Skip to main content

Informational Masking in Speech Recognition

  • Chapter
  • First Online:
The Auditory System at the Cocktail Party

Part of the book series: Springer Handbook of Auditory Research ((SHAR,volume 60))

Abstract

Solving the “cocktail party problem” depends on segregating, selecting, and comprehending the message of one specific talker among competing talkers. This chapter reviews the history of study of speech-on-speech (SOS) masking, highlighting the major ideas influencing the development of theories that have been proposed to account for SOS masking. Much of the early work focused on the role of spectrotemporal overlap of sounds, and the concomitant competition for representation in the auditory nervous system, as the primary cause of masking (termed energetic masking). However, there were some early indications—confirmed and extended in later studies—of the critical role played by central factors such as attention, memory, and linguistic processing. The difficulties related to these factors are grouped together and referred to as informational masking. The influence of methodological issues—in particular the need for a means of designating the target source in SOS masking experiments—is emphasized as contributing to the discrepancies in the findings and conclusions that frequent the history of study of this topic. Although the modeling of informational masking for the case of SOS masking has yet to be developed to any great extent, a long history of modeling binaural release from energetic masking has led to the application/adaptation of binaural models to the cocktail party problem. These models can predict some, but not all, of the factors that contribute to solving this problem. Some of these models, and their inherent limitations, are reviewed briefly here.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Irwin Pollack (2002; personal communication) attributed his use of the term “informational masking” to influential comments by George A. Miller at a seminar presented by Pollack describing the masking of speech by bands of filtered noise. According to Pollack, Miller objected to (Pollack’s) use of noise as a masker considering its effects to be “secondary” to the “informational content of the messages” contained in speech maskers.

References

  • ANSI (American National Standards Institute). (1997). American National Standard: Methods for calculation of the speech intelligibility index. Melville, NY: Acoustical Society of America.

    Google Scholar 

  • Arbogast, T. L., & Kidd, G., Jr. (2000). Evidence for spatial tuning in informational masking using the probe-signal method. The Journal of the Acoustical Society of America, 108(4), 1803–1810.

    Article  CAS  PubMed  Google Scholar 

  • Arbogast, T. L., Mason, C. R., & Kidd, G., Jr. (2002). The effect of spatial separation on informational and energetic masking of speech. The Journal of the Acoustical Society of America, 112(5), 2086–2098.

    Article  PubMed  Google Scholar 

  • Başkent, D. & Gaudrain, E. (2016). Musician advantage for speech-on-speech perception. The Journal of the Acoustical Society of America, 139(3), EL51–EL56.

    Google Scholar 

  • Beranek, L. (1947). Design of speech communication systems. Proceedings of the Institute of Radio Engineers, 35(9), 880–890.

    Google Scholar 

  • Best, V., Marrone, N., Mason, C. R., & Kidd, G., Jr. (2012). The influence of non-spatial factors on measures of spatial release from masking. The Journal of the Acoustical Society of America, 131(4), 3103–3110.

    Article  PubMed  PubMed Central  Google Scholar 

  • Best, V., Mason, C. R., Kidd, G. Jr., Iyer, N., & Brungart, D. S. (2015). Better ear glimpsing efficiency in hearing-impaired listeners. The Journal of the Acoustical Society of America, 137(2), EL213–EL219.

    Google Scholar 

  • Best, V., Mason, C. R., & Kidd, G., Jr. (2011). Spatial release from masking as a function of the temporal overlap of competing maskers. The Journal of the Acoustical Society of America, 129(3), 1616–1625.

    Article  PubMed  PubMed Central  Google Scholar 

  • Best, V., Ozmeral, E. J., & Shinn-Cunningham, B. G. (2007). Visually-guided attention enhances target identification in a complex auditory scene. The Journal of the Association for Research in Otolaryngology, 8, 294–304.

    Article  PubMed  Google Scholar 

  • Beutelmann, R., Brand, T., & Kollmeier, B. (2009). Prediction of binaural speech intelligibility with frequency-dependent interaural phase differences. The Journal of the Acoustical Society of America, 126(3), 1359–1368.

    Article  PubMed  Google Scholar 

  • Beutelmann, R., Brand, T., & Kollmeier, B. (2010). Revision, extension, and evaluation of a binaural speech intelligibility model. The Journal of the Acoustical Society of America, 127(4), 2479–2497.

    Article  PubMed  Google Scholar 

  • Broadbent, D. E. (1952a). Listening to one of two synchronous messages. The Journal of Experimental Psychology, 44(1), 51–55.

    Article  CAS  PubMed  Google Scholar 

  • Broadbent, D. E. (1952b). Failures of attention in selective listening. The Journal of Experimental Psychology, 44(6), 428–433.

    Article  CAS  PubMed  Google Scholar 

  • Broadbent, D. E. (1958). Perception and communication. Oxford: Pergamon Press.

    Book  Google Scholar 

  • Bronkhorst, A. W. (2015). The cocktail-party problem revisited: Early processing and selection of multi-talker speech. Attention, Perception, & Psychophysics, 77(5), 1465–1487.

    Google Scholar 

  • Brouwer, S., Van Engen, K., Calandruccio, L., & Bradlow, A. R. (2012). Linguistic contributions to speech-on-speech masking for native and non-native listeners: Language familiarity and semantic content. The Journal of the Acoustical Society of America, 131(2), 1449–1464.

    Article  PubMed  PubMed Central  Google Scholar 

  • Brungart, D. S. (2001). Informational and energetic masking effects in the perception of two simultaneous talkers. The Journal of the Acoustical Society of America, 109(3), 1101–1109.

    Article  CAS  PubMed  Google Scholar 

  • Brungart, D. S., Chang, P. S., Simpson, B. D., & Wang, D. (2006). Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation. The Journal of the Acoustical Society of America, 120(6), 4007–4018.

    Article  PubMed  Google Scholar 

  • Brungart, D. S., Chang, P. S., Simpson, B. D., & Wang, D. (2009). Multitalker speech perception with ideal time-frequency segregation: Effects of voice characteristics and number of talkers. The Journal of the Acoustical Society of America, 125(6), 4006–4022.

    Article  PubMed  Google Scholar 

  • Brungart, D. S., & Iyer, N. (2012). Better-ear glimpsing efficiency with symmetrically-placed interfering talkers. The Journal of the Acoustical Society of America, 132(4), 545–2556.

    Article  Google Scholar 

  • Brungart, D. S., & Simpson, B. D. (2004). Within-ear and across-ear interference in a dichotic cocktail party listening task: Effects of masker uncertainty. The Journal of the Acoustical Society of America, 115(1), 301–310.

    Article  PubMed  Google Scholar 

  • Buss, E., Grose, J., & Hall, J. W., III. (2016). Effect of response context and masker type on word recognition. The Journal of the Acoustical Society of America, 140(2), 968–977.

    Article  PubMed  Google Scholar 

  • Calandruccio, L., Brouwer, S., Van Engen, K., Dhar, S., & Bradlow, A. (2013). Masking release due to linguistic and phonetic dissimilarity between the target and masker speech. American Journal of Audiology, 22(1), 157–164.

    Article  PubMed  PubMed Central  Google Scholar 

  • Calandruccio, L., Dhar, S., & Bradlow, A. R. (2010). Speech-on-speech masking with variable access to the linguistic content of the masker speech. The Journal of the Acoustical Society of America, 128(2), 860–869.

    Article  PubMed  PubMed Central  Google Scholar 

  • Calandruccio, L., Leibold, L. J., & Buss, E. (2016). Linguistic masking release in school-age children and adults. American Journal of Audiology, 25, 34–40.

    Article  PubMed  PubMed Central  Google Scholar 

  • Carhart, R., Tillman, T. W., & Greetis, E. S. (1969a). Release from multiple maskers: Effects of interaural time disparities. The Journal of the Acoustical Society of America, 45(2), 411–418.

    Article  CAS  PubMed  Google Scholar 

  • Carhart, R., Tillman, T. W., & Greetis, E. S. (1969b). Perceptual masking in multiple sound backgrounds. The Journal of the Acoustical Society of America, 45(3), 694–703.

    Article  CAS  PubMed  Google Scholar 

  • Carlile, S. (2014). Active listening: Speech intelligibility in noisy environments. Acoustics Australia, 42, 98–104.

    Google Scholar 

  • Cherry, E. C. (1953). Some experiments on the recognition of speech, with one and two ears. The Journal of the Acoustical Society of America, 25(5), 975–979.

    Article  Google Scholar 

  • Clayton, K. K., Swaminathan, J., Yazdanbakhsh, A., Patel, A. D., & Kidd, G., Jr. (2016). Exectutive function, visual attention and the cocktail party problem in musicians and non-musicians. PLoS ONE, 11(7), e0157638.

    Article  PubMed  PubMed Central  Google Scholar 

  • Colburn, H. S., & Durlach, N. I. (1978). Models of binaural interaction. In E. Carterette & M. Friedman (Eds.), Handbook of perception: Hearing (Vol. 4, pp. 467–518). New York: Academic Press.

    Google Scholar 

  • Cooke, M., Lecumberri, M. G., & Barker, J. (2008). The foreign language cocktail party problem: Energetic and informational masking effects in non-native speech perception. The Journal of the Acoustical Society of America, 123(1), 414–427.

    Article  PubMed  Google Scholar 

  • Dirks, D. D., & Bower, D. R. (1969). Masking effects of speech competing messages. Journal of Speech and Hearing Research, 12(2), 229–245.

    Article  CAS  PubMed  Google Scholar 

  • Durlach, N. I. (1963). Equalization and cancellation theory of binaural masking-level differences. The Journal of the Acoustical Society of America, 35(8), 1206–1218.

    Article  Google Scholar 

  • Egan, J. P., & Wiener, F. M. (1946). On the intelligibility of bands of speech in noise. The Journal of the Acoustical Society of America, 18(2), 435–441.

    Article  Google Scholar 

  • Ezzatian, P., Avivi, M., & Schneider, B. A. (2010). Do nonnative listeners benefit as much as native listeners from spatial cues that release speech from masking? Speech Communication, 52(11), 919–929.

    Article  Google Scholar 

  • Fletcher, H. (1940). Auditory patterns. Review of Modern Physics, 12(1), 47–65.

    Article  Google Scholar 

  • French, N. R., & Steinberg, J. C. (1947). Factors governing the intelligibility of speech sounds. The Journal of the Acoustical Society of America, 19(1), 90–119.

    Article  Google Scholar 

  • Freyman, R. L., Balakrishnan, U., & Helfer, K. S. (2001). Spatial release from informational masking in speech recognition. The Journal of the Acoustical Society of America, 109(5), 2112–2122.

    Article  CAS  PubMed  Google Scholar 

  • Freyman, R. L., Balakrishnan, U., & Helfer, K. S. (2004). Effect of number of masker talkers and auditory priming on informational masking in speech recognition. The Journal of the Acoustical Society of America, 115(5), 2246–2256.

    Article  PubMed  Google Scholar 

  • Freyman, R. L., Helfer, K. S., & Balakrishnan, U. (2007). Variability and uncertainty in masking by competing speech. The Journal of the Acoustical Society of America, 121(2), 1040–1046.

    Article  PubMed  Google Scholar 

  • Freyman, R. L., Helfer, K. S., McCall, D. D., & Clifton, R. K. (1999). The role of perceived spatial separation in the unmasking of speech. The Journal of the Acoustical Society of America, 106(6), 3578–3588.

    Article  CAS  PubMed  Google Scholar 

  • Helfer, K. S., & Jesse, A. (2015). Lexical influences on competing speech perception in younger, middle-aged, and older adults. The Journal of the Acoustical Society of America, 138(1), 363–376.

    Article  PubMed  PubMed Central  Google Scholar 

  • Hirsh, I. J. (1948). The influence of interaural phase on interaural summation and inhibition. The Journal of the Acoustical Society of America, 20(4), 536–544.

    Article  Google Scholar 

  • Hygge, S., Ronnberg, J., Larsby, B., & Arlinger, S. (1992). ‘Normal hearing and hearing-impaired subjects’ ability to just follow conversation in competing speech, reversed speech, and noise backgrounds. Journal of Speech and Hearing Research, 35(1), 208–215.

    Article  CAS  PubMed  Google Scholar 

  • Iyer, N., Brungart, D. S., & Simpson, B. D. (2010). Effects of target-masker contextual similarity on the multimasker penalty in a three-talker diotic listening task. The Journal of the Acoustical Society of America, 128(5), 2998–3010.

    Article  PubMed  Google Scholar 

  • Jeffress, L. A. (1948). A place theory of sound localization. Journal of Comparative and Physiological Psychology, 41(1), 35–39.

    Article  CAS  PubMed  Google Scholar 

  • Jeffress, L. A., Blodgett, H. C., Sandel, T. T., & Wood, C. L. III. (1956). Masking of tonal signals. The Journal of the Acoustical Society of America, 28(3), 416–426.

    Article  Google Scholar 

  • Johnsrude, I. S., Mackey, A., Hakyemez, H., Alexander, E., et al. (2013). Swinging at a cocktail party: Voice familiarity aids speech perception in the presence of a competing voice. Psychological Science, 24, 1995–2004.

    Article  PubMed  Google Scholar 

  • Kalikow, D. N., Stevens, K. N., & Elliot, L. L. (1977). Development of a test of speech intelligibility in noise using sentence materials with controlled word predictability. The Journal of the Acoustical Society of America, 61(5), 1337–1351.

    Article  CAS  PubMed  Google Scholar 

  • Kellogg, E. W. (1939). Reversed speech. The Journal of the Acoustical Society of America, 10(4), 324–326.

    Article  Google Scholar 

  • Kidd, G., Jr., Arbogast, T. L., Mason, C. R., & Gallun, F. J. (2005). The advantage of knowing where to listen. The Journal of the Acoustical Society of America, 118(6), 3804–3815.

    Article  PubMed  Google Scholar 

  • Kidd, G., Jr., Best, V., & Mason, C. R. (2008a). Listening to every other word: Examining the strength of linkage variables in forming streams of speech. The Journal of the Acoustical Society of America, 124(6), 3793–3802.

    Article  PubMed  PubMed Central  Google Scholar 

  • Kidd, G., Jr., Mason, C. R., & Best, V. (2014). The role of syntax in maintaining the integrity of streams of speech. The Journal of the Acoustical Society of America, 135(2), 766–777.

    Article  PubMed  PubMed Central  Google Scholar 

  • Kidd, G., Jr., Mason, C. R., Best, V., & Marrone, N. L. (2010). Stimulus factors influencing spatial release from speech on speech masking. The Journal of the Acoustical Society of America, 128(4), 1965–1978.

    Article  PubMed  PubMed Central  Google Scholar 

  • Kidd, G., Jr., Mason, C. R., Richards, V. M., Gallun, F. J., & Durlach, N. I. (2008b). Informational masking. In W. A. Yost, A. N. Popper, & R. R. Fay (Eds.), Auditory perception of sound sources (pp. 143–190). New York: Springer Science + Business Media.

    Google Scholar 

  • Kidd, G., Jr., Mason, C. R., Swaminathan, J., Roverud, E., et al. (2016). Determining the energetic and informational components of speech-on-speech masking. The Journal of the Acoustical Society of America, 140(1), 132–144.

    Article  PubMed  Google Scholar 

  • Levitt, H., & Rabiner, L. R. (1967a). Binaural release from masking for speech and gain in intelligibility. The Journal of the Acoustical Society of America, 42(3), 601–608.

    Article  CAS  PubMed  Google Scholar 

  • Levitt, H., & Rabiner, L. R. (1967b). Predicting binaural gain in intelligibility and release from masking for speech. The Journal of the Acoustical Society of America, 42(4), 820–829.

    Article  CAS  PubMed  Google Scholar 

  • Licklider, J. C. R. (1948). The influence of interaural phase relations upon the masking of speech by white noise. The Journal of the Acoustical Society of America, 20(2), 150–159.

    Article  Google Scholar 

  • Marrone, N. L., Mason, C. R., & Kidd, G., Jr. (2008). Tuning in the spatial dimension: Evidence from a masked speech identification task. The Journal of the Acoustical Society of America, 124(2), 1146–1158.

    Article  PubMed  PubMed Central  Google Scholar 

  • Mattys, S. L., Davis, M. H., Bradlow, A. R., & Scott, S. K. (2012). Speech recognition in adverse conditions: A review. Language and Cognitive Processes, 27(7–8), 953–978.

    Article  Google Scholar 

  • Miller, G. A. (1947). The masking of speech. Psychological Bulletin, 44(2), 105–129.

    Article  CAS  PubMed  Google Scholar 

  • Newman, R. (2009). Infants’ listening in multitalker environments: Effect of the number of background talkers. Attention, Perception, & Psychophysics, 71(4), 822–836.

    Article  Google Scholar 

  • Newman, R. S., Morini, G., Ahsan, F., & Kidd, G., Jr. (2015). Linguistically-based informational masking in preschool children. The Journal of the Acoustical Society of America, 138(1), EL93–EL98.

    Google Scholar 

  • Rhebergen, K. S., Versfeld, N. J., & Dreschler, W. A. (2005). Release from informational masking by time reversal of native and non-native interfering speech. The Journal of the Acoustical Society of America, 118(3), 1274–1277.

    Article  PubMed  Google Scholar 

  • Rhebergen, K. S., Versfeld, N. J., & Dreschler, W. A. (2006). Extended speech intelligibility index for the prediction of the speech reception threshold in fluctuating noise. The Journal of the Acoustical Society of America, 120(6), 3988–3997.

    Article  PubMed  Google Scholar 

  • Samson, F., & Johnsrude, I. S. (2016). Effects of a consistent target or masker voice on target speech intelligibility in two- and three-talker mixtures. The Journal of the Acoustical Society of America, 139(3), 1037–1046.

    Article  PubMed  Google Scholar 

  • Schubert, E. D., & Schultz, M. C. (1962). Some aspects of binaural signal selection. The Journal of the Acoustical Society of America, 34(6), 844–849.

    Article  Google Scholar 

  • Schubotz, W., Brand, T., Kollmeier, B., & Ewert, S. D. (2016). Monaural speech intelligibility and detection in maskers with varying amounts of spectro-temporal speech features. The Journal of the Acoustical Society of America, 140(1), 524–540.

    Article  PubMed  Google Scholar 

  • Speaks, C., & Jerger, J. (1965). Method for measurement of speech identification. Journal of Speech and Hearing Research, 8(2), 185–194.

    Article  Google Scholar 

  • Swaminathan, J., Mason, C. R., Streeter, T. M., Best, V. A., et al. (2015). Musical training and the cocktail party problem. Scientific Reports, 5, 1–10, No. 11628.

    Google Scholar 

  • Uslar, V. N., Carroll, R., Hanke, M., Hamann, C., et al. (2013). Development and evaluation of a linguistically and audiologically controlled sentence intelligibility test. The Journal of the Acoustical Society of America, 134(4), 3039–3056.

    Article  PubMed  Google Scholar 

  • Van Engen, K. J., & Bradlow, A. R. (2007). Sentence recognition in native- and foreign-language multi-talker background noise. The Journal of the Acoustical Society of America, 121(1), 519–526.

    Article  PubMed  PubMed Central  Google Scholar 

  • Wan, R., Durlach, N. I., & Colburn, H. S. (2010). Application of an extended equalization-cancellation model to speech intelligibility with spatially distributed maskers. The Journal of the Acoustical Society of America, 128(6), 3678–3690.

    Article  PubMed  PubMed Central  Google Scholar 

  • Wan, R., Durlach, N. I., & Colburn, H. S. (2014). Application of a short-time version of the equalization-cancellation model to speech intelligibility experiments. The Journal of the Acoustical Society of America, 136(2), 768–776.

    Article  PubMed  PubMed Central  Google Scholar 

  • Watson, C. S. (2005). Some comments on informational masking. Acta Acustica united with Acustica, 91(3), 502–512.

    Google Scholar 

  • Webster, F. A. (1951). The influence of interaural phase on masked thresholds. I: The role of interaural time-deviation. The Journal of the Acoustical Society of America, 23(4), 452–462.

    Article  Google Scholar 

  • Webster, J. C. (1983). Applied research on competing messages. In J. V. Tobias & E. D. Schubert (Eds.), Hearing research and theory (Vol. 2, pp. 93–123). New York: Academic Press.

    Google Scholar 

  • Zurek, P. M. (1993). Binaural advantages and directional effects in speech intelligibility. In G. A. Studebaker & I. Hochberg (Eds.), Acoustical factors affecting hearing aid performance (pp. 255–276). Boston: Allyn and Bacon.

    Google Scholar 

Download references

Acknowledgements

The authors are indebted to Christine Mason for her comments on this chapter and for her assistance with its preparation. Thanks also to Elin Roverud and Jing Mi for providing comments on an earlier version and to the members of the Psychoacoustics Laboratory, Sargent College graduate seminar SLH 810, and Binaural Group for many insightful discussions of these topics. We are also grateful to those authors who generously allowed their figures to be reprinted here and acknowledge the support of the National Institutes of Health/National Institute on Deafness and Other Communication Disorders and Air Force Office of Scientific Research for portions of the research described here.

Compliance with Ethics Requirements

Gerald Kidd, Jr. declares that he has no conflict of interest.

H. Steven Colburn declares that he has no conflict of interest.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gerald Kidd Jr. .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Kidd, G., Colburn, H.S. (2017). Informational Masking in Speech Recognition. In: Middlebrooks, J., Simon, J., Popper, A., Fay, R. (eds) The Auditory System at the Cocktail Party. Springer Handbook of Auditory Research, vol 60. Springer, Cham. https://doi.org/10.1007/978-3-319-51662-2_4

Download citation

Publish with us

Policies and ethics