Skip to main content

Selecting Representative Speakers for a Speech Database on the Basis of Heterogeneous Similarity Criteria

  • Chapter
Speaker Classification II

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4441))

Abstract

In the context of the Neologos French speech database creation project, a general methodology was defined for the selection of representative speaker recordings. The selection aims at providing a good coverage in terms of speaker variability while limiting the number of recorded speakers. This is intended to make the resulting database both more adapted to the development of recently proposed multi-model methods and less expensive to collect.

The presented methodology proposes a selection process based on the optimization of a quality criterion defined in a variety of speaker similarity modeling frameworks. The selection can be achieved with respect to a unique similarity criterion, using classical clustering methods such as Hierarchical or K-Medians clustering, or it can combine several speaker similarity criteria, thanks to a newly developed clustering method called Focal Speakers Selection.

In this framework, four different speaker similarity criteria are tested, and three different speaker clustering algorithms are compared. Results pertaining to the collection of the Neologos database are also discussed.

The Neologos project was funded by the French Ministry of Research in the framework of the Technolangue program.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Nagorski, A., Boves, L.: Steeneken: Optimal selection of speech data for automatic speech recognition systems. In: ICSLP, pp 2473–2476 (2002)

    Google Scholar 

  2. Lippmann, R.: Speech recognition by machines and humans. Speech Communication 22(1), 1–15 (1997)

    Article  Google Scholar 

  3. Iskra, D., Toto, T.: Speecon - speech databases for consumer devices: Database specification and validation. In: LREC, pp. 329–333 (2002)

    Google Scholar 

  4. Nakamura, A., Matsunaga, S., Shimizu, T., Tonomura, M., Sagisaka, Y.: Japanese speech databases for robust speech recognition. In: Proc. ICSLP 1996. Philadelphia, PA, vol. 4, pp. 2199–2202 (1996)

    Google Scholar 

  5. François, H., Boëffard, O.: Design of an optimal continuous speech database for text-to-speech synthesis considered as a set covering problem. In: Proc. Eurospeech 2001 (2001)

    Google Scholar 

  6. Krstulović, S., Bimbot, F., Boëffard, O., Charlet, D., Fohr, D., Mella, O.: Optimizing the coverage of a speech database through a selection of representative speaker recordings. Speech Communication 48(10), 1319–1348 (2006)

    Article  Google Scholar 

  7. Padmanabhan, M., Bahl, L., Nahamoo, D., Picheny, M.: Speaker clustering and transformation for speaker adaptation in speech recognition system. IEEE Transactions on Speech and Audio Processing 6(1), 71–77 (1998)

    Article  Google Scholar 

  8. Johnson, S., Woodland, P.: Speaker clustering using direct maximisation of the MLLR-adapted likelihood. In: ICSLP. vol. 5(98), pp. 1775–1779

    Google Scholar 

  9. Naito, M., Deng, L., Sagisaka, Y.: Speaker clustering for speech recognition using vocal tract parameters. Speech Communication 36(3-4), 305–315 (2002)

    Article  MATH  Google Scholar 

  10. Gauvain, J., Lee, C.: Maximum a posteriori estimation for multivariate gaussian mixture observations of markov chains. IEEE Transactions on Speech and Audio Processing 2(2), 291–299 (1994)

    Article  Google Scholar 

  11. Reynolds, D.A., Quatieri, T., Dunn, R.: Speaker verification using adapted gaussian mixture models. Digital Signal Processing 10(1-3), 19–41 (2000)

    Article  Google Scholar 

  12. Ben, M., Blouet, R., Bimbot, F.: A Monte-Carlo method for score normalization in Automatic Speaker Verification using Kullback-Leibler distances. In: Proc. ICASSP 2002 (2002)

    Google Scholar 

  13. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. John Wiley and Sons, New York (2001)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Christian Müller

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Krstulović, S., Bimbot, F., Boëffard, O., Charlet, D., Fohr, D., Mella, O. (2007). Selecting Representative Speakers for a Speech Database on the Basis of Heterogeneous Similarity Criteria. In: Müller, C. (eds) Speaker Classification II. Lecture Notes in Computer Science(), vol 4441. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74122-0_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74122-0_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74121-3

  • Online ISBN: 978-3-540-74122-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics