The AMI Speaker Diarization System for NIST RT06s Meeting Data

van Leeuwen, David A.; Huijbregts, Marijn

doi:10.1007/11965152_33

David A. van Leeuwen¹⁹ &
Marijn Huijbregts²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4299))

Included in the following conference series:

International Workshop on Machine Learning for Multimodal Interaction

784 Accesses
19 Citations

Abstract

We describe the systems submitted to the NIST RT06s evaluation for the Speech Activity Detection (SAD) and Speaker Diarization (SPKR) tasks. For speech activity detection, a new analysis methodology is presented that generalizes the Detection Erorr Tradeoff analysis commonly used in speaker detection tasks. The speaker diarization systems are based on the TNO and ICSI system submitted for RT05s. For the conference room evaluation Single Distant Microphone condition, the SAD results perform well at 4.23 % error rate, and the ‘HMM-BIC’ SPKR results perform competatively at an error rate of 37.2 % including overlapping speech.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Fiscus, J.G., Radde, N., Garofolo, J.S., Le, A., Ajot, J., Laprun, C.: The rich transcription 2006 spring meeting recognition evaluation. LNCS, pp. 309–322. Springer, Heidelberg (2007)
Google Scholar
van Leeuwen, D.A.: The TNO speaker diarization system for NIST rich transcription evaluation 2005 for meeting data. LNCS, pp. 400–449. Springer, Heidelberg (2006)
Google Scholar
Hain, T., Burget, L., Dines, J., Garau, G., Karafiat, M., Lincoln, M., Vepa, J., Wan, V.: The AMI meeting transcription system: Progress and performance. LNCS, pp. 419–431. Springer, Heidelberg (2007)
Google Scholar
Anguera, X., Wooters, C., Peskin, B., Aguiló, M.: Robust speaker segmentation for meetings: The ICSI-SRI spring 2005 diarization system. LNCS, pp. 402–414. Springer, Heidelberg (2006)
Google Scholar
Hermansky, H., Morgan, N.: Rasta processing of speech. IEEE Transactions on Speech and Audio Processing, special issue on Robust Speech Recognition 2, 578–589 (1994)
Google Scholar
Macho, D., Temko, A., Nadeu, C.: Robust speech activity detection in interactive smart-room environment. LNCS, pp. 236–247. Springer, Heidelberg (2007)
Google Scholar
Martin, A., Doddington, G., Kamm, T., Ordowski, M., Przybocki, M.: The DET curve in assessment of detection task performance. In: Proc. Eurospeech 1997, Rhodes, Greece, pp. 1895–1898 (1997)
Google Scholar
van Leeuwen, D.A., Bouten, J.S.: Results of the 2003 NFI-TNO forensic speaker recognition evaluation. In: Proc. Odyssey 2004 Speaker and Language recognition workshop, ISCA, pp. 75–82 (2004)
Google Scholar
Pellom, B., Hacioglu, K.: Recent Improvements in the CU Sonic ASR system for Noisy Speech: The SPINE Task. In: Proc. ICASSP (2003)
Google Scholar
Ajmera, J., McCowan, I., Bourlard, H.: Robust speaker change detection. IEEE Signal Processing Lettres 11, 649–651 (2004)
Article Google Scholar
Navrátil, J., Ramsawamy, G.N.: The awe and mistery of t-norm. In: Proc. Eurospeech, pp. 2009–2012 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

TNO Human Factors, Postbus 23, 3769, Soesterberg, The Netherlands
David A. van Leeuwen
Department of EEMCS, Human Media Interaction, University of Twente, Enschede, The Netherlands
Marijn Huijbregts

Authors

David A. van Leeuwen
View author publications
You can also search for this author in PubMed Google Scholar
Marijn Huijbregts
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Edinburgh, Edinburgh, Scotland
Steve Renals
IDIAP Research Institute, Martigny, Switzerland
Samy Bengio
National Institute Of Standards and Technology, 100 Bureau Drive Stop 8940, Gaithersburg, MD, 20899
Jonathan G. Fiscus

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

van Leeuwen, D.A., Huijbregts, M. (2006). The AMI Speaker Diarization System for NIST RT06s Meeting Data. In: Renals, S., Bengio, S., Fiscus, J.G. (eds) Machine Learning for Multimodal Interaction. MLMI 2006. Lecture Notes in Computer Science, vol 4299. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11965152_33

Download citation

DOI: https://doi.org/10.1007/11965152_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69267-6
Online ISBN: 978-3-540-69268-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics