Combining Evidences from Mel Cepstral and Cochlear Cepstral Features for Speaker Recognition Using Whispered Speech

Raikar, Aditya; Gandhi, Ami; Patil, Hemant A.

doi:10.1007/978-3-319-24033-6_46

Aditya Raikar¹⁵,
Ami Gandhi¹⁵ &
Hemant A. Patil¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9302))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

1830 Accesses
1 Citations

Abstract

Whisper is an alternative way of speech communication especially when a speaker does not want to reveal the information other than the target listeners. Generally, speaker-specific information is present in both excitation source and vocal tract system. However, whispered speech does not contain significant source characteristics as there is almost no excitation by the vocal folds, and speaker information in vocal tract system is also low as compared to the normal speech signal. Hence, it is difficult to recognize a speaker from his/her whispered speech. To address this, features based on vocal tract system characteristics such as state-of-the-art Mel Frequency Cepstral Coefficients (MFCC) and recently developed Cochlear Frequency Cepstral Coefficients (CFCC) are proposed. CHAINS (Characterizing individual speakers) whispered speech database is used for conducting experiments using GMM-UBM (Gaussian Mixture Modeling- Universal Background Modeling) approach. It was observed from the experiments that the fusion of CFCC and MFCC gives improvement in % IR (Identification Rate) and % EER (Equal Error Rate) than MFCC alone, indicating that proposed features and their score-level fusion captures complementary speaker-specific information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abe, M., Shikano, K., Kuwabara, H.: Cross-language voice conversion. In: Int. Conf. on Acous., Speech, & Signal Process., (ICASSP-1990), pp. 345–348. IEEE, New Mexico (1990)
Google Scholar
Yegnanarayana, B., Prasanna, S., Zachariah, J.M., Gupta, C.S.: Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system. IEEE Trans. on Speech and Audio Process. 13(4), 575–582 (2005)
Article Google Scholar
Imai, S., Kobayashi, T., Tokuda, K., Masuko, T., Koishida, K., Sako, S., Zen, H.: Speech signal processing toolkit (SPTK), Version 3.3 (2009)
Google Scholar
Yegnanarayana, B., Sharat Reddy, K., Kishore, S.P.: Source and system features for speaker recognition using AANN models. In: IEEE Int. Conf. on Acous., Speech, and Signal Process., (ICASSP 2001), vol. 1, pp. 409–412. IEEE, Salt Lake City (2001)
Google Scholar
Fan, X., Hansen, J.H.: Speaker identification within whispered speech audio streams. IEEE Trans. on Audio, Speech, and Lang. Process. 19(5), 1408–1421 (2011)
Article Google Scholar
Gavidia-Ceballos, L.: Analysis and modeling of speech for laryngeal pathology assessment. PhD thesis, Duke University, Durham NC, USA (1995)
Google Scholar
Gavidia-Ceballos, L., Hansen, J.H.: Direct speech feature estimation using an iterative EM algorithm for vocal fold pathology detection. IEEE Trans. on Biomedical Engg. 43(4), 373–383 (1996)
Article Google Scholar
Meyer-Eppler, W.: Realization of prosodic features in whispered speech. The Journal of the Acoustical Society of America 29(1), 104–106 (1957)
Article Google Scholar
Thomas, I.: Perceived pitch of whispered vowels. The Journal of the Acoustical Society of America 46(2B), 468–470 (1969)
Article Google Scholar
Jovicic, S.T.: Formant feature differences between whispered and voiced sustained vowels. Acta Acustica United with Acustica 84(4), 739–743 (1998)
Google Scholar
Morris, R.W., Clements, M.A.: Reconstruction of speech from whispers. Medical Engineering & Physics 24(7), 515–520 (2002)
Article Google Scholar
Zhang, C., Hansen, J.H.: An entropy based feature for whisper-island detection within audio streams. In: INTERSPEECH, Brisbane, Australia, pp. 2510–2513 (2008)
Google Scholar
Neustein, A., Patil, H.A.: Forensic speaker recognition. Springer (2012)
Google Scholar
Childers, D.G., Wu, K.: Gender recognition from speech. Part II: Fine analysis. The Journal of the Acoustical Society of America 90(4), 1841–1856 (1991)
Article Google Scholar
Li, Q.: An auditory-based transfrom for audio signal processing. In: IEEE Workshop on Applications of Signal Process. to Audio and Acous., WASPAA 2009, pp. 181–184. IEEE, New York (2009)
Google Scholar
Li, Q., Huang, Y.: An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions. IEEE Trans. on Audio, Speech, and Lang. Process. 19(6), 1791–1801 (2011)
Article Google Scholar
Bricker, P., Pruzansky, S.: Speaker recognition. In: Contemporary issues in experimental phonetics, pp. 295–326 (1976)
Google Scholar
Cummins, F., Grimaldi, M., Leonard, T., Simko, J.: The CHAINS corpus: characterizing individual speakers. In: Proc. SPECOM, St. Petersburg, Russia, vol. 6, pp. 431–435 (2006)
Google Scholar
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixture models. Digital Signal Processing 10(1), 19–41 (2000)
Article Google Scholar
Peláez-Moreno, C., Gallardo-Antolín, A., Díaz-de María, F.: Recognizing Over IP: A robust front-end for speech recognition on the world wide web. IEEE Trans. on Multimedia 3(2), 209–218 (2001)
Article Google Scholar
Martin, A., Doddington, G., Kamm, T., Ordowski, M., Przybocki, M.: The DET curve in assessment of detection task performance. In: Euro Conf. Speech Process. Tech., Rhodes, Greece, pp. 1895–1898 (1997)
Google Scholar
Fan, X., Hansen, J.H.: Speaker identification with whispered speech based on modified LFCC parameters and feature mapping. In: IEEE Int. Conf. on Acous., Speech and Signal Process., (ICASSP 2009), pp. 4553–4556. IEEE, Taipei (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

DA-IICT, Gandhinagar, India
Aditya Raikar, Ami Gandhi & Hemant A. Patil

Authors

Aditya Raikar
View author publications
You can also search for this author in PubMed Google Scholar
Ami Gandhi
View author publications
You can also search for this author in PubMed Google Scholar
Hemant A. Patil
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aditya Raikar .

Editor information

Editors and Affiliations

University of West Bohemia, Pilsen, Czech Republic
Pavel Král
University of West Bohemia, Pilsen, Czech Republic
Václav Matoušek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Raikar, A., Gandhi, A., Patil, H.A. (2015). Combining Evidences from Mel Cepstral and Cochlear Cepstral Features for Speaker Recognition Using Whispered Speech. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_46

Download citation

DOI: https://doi.org/10.1007/978-3-319-24033-6_46
Published: 11 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24032-9
Online ISBN: 978-3-319-24033-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics