Abstract
Whisper is an alternative way of speech communication especially when a speaker does not want to reveal the information other than the target listeners. Generally, speaker-specific information is present in both excitation source and vocal tract system. However, whispered speech does not contain significant source characteristics as there is almost no excitation by the vocal folds, and speaker information in vocal tract system is also low as compared to the normal speech signal. Hence, it is difficult to recognize a speaker from his/her whispered speech. To address this, features based on vocal tract system characteristics such as state-of-the-art Mel Frequency Cepstral Coefficients (MFCC) and recently developed Cochlear Frequency Cepstral Coefficients (CFCC) are proposed. CHAINS (Characterizing individual speakers) whispered speech database is used for conducting experiments using GMM-UBM (Gaussian Mixture Modeling- Universal Background Modeling) approach. It was observed from the experiments that the fusion of CFCC and MFCC gives improvement in % IR (Identification Rate) and % EER (Equal Error Rate) than MFCC alone, indicating that proposed features and their score-level fusion captures complementary speaker-specific information.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abe, M., Shikano, K., Kuwabara, H.: Cross-language voice conversion. In: Int. Conf. on Acous., Speech, & Signal Process., (ICASSP-1990), pp. 345–348. IEEE, New Mexico (1990)
Yegnanarayana, B., Prasanna, S., Zachariah, J.M., Gupta, C.S.: Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system. IEEE Trans. on Speech and Audio Process. 13(4), 575–582 (2005)
Imai, S., Kobayashi, T., Tokuda, K., Masuko, T., Koishida, K., Sako, S., Zen, H.: Speech signal processing toolkit (SPTK), Version 3.3 (2009)
Yegnanarayana, B., Sharat Reddy, K., Kishore, S.P.: Source and system features for speaker recognition using AANN models. In: IEEE Int. Conf. on Acous., Speech, and Signal Process., (ICASSP 2001), vol. 1, pp. 409–412. IEEE, Salt Lake City (2001)
Fan, X., Hansen, J.H.: Speaker identification within whispered speech audio streams. IEEE Trans. on Audio, Speech, and Lang. Process. 19(5), 1408–1421 (2011)
Gavidia-Ceballos, L.: Analysis and modeling of speech for laryngeal pathology assessment. PhD thesis, Duke University, Durham NC, USA (1995)
Gavidia-Ceballos, L., Hansen, J.H.: Direct speech feature estimation using an iterative EM algorithm for vocal fold pathology detection. IEEE Trans. on Biomedical Engg. 43(4), 373–383 (1996)
Meyer-Eppler, W.: Realization of prosodic features in whispered speech. The Journal of the Acoustical Society of America 29(1), 104–106 (1957)
Thomas, I.: Perceived pitch of whispered vowels. The Journal of the Acoustical Society of America 46(2B), 468–470 (1969)
Jovicic, S.T.: Formant feature differences between whispered and voiced sustained vowels. Acta Acustica United with Acustica 84(4), 739–743 (1998)
Morris, R.W., Clements, M.A.: Reconstruction of speech from whispers. Medical Engineering & Physics 24(7), 515–520 (2002)
Zhang, C., Hansen, J.H.: An entropy based feature for whisper-island detection within audio streams. In: INTERSPEECH, Brisbane, Australia, pp. 2510–2513 (2008)
Neustein, A., Patil, H.A.: Forensic speaker recognition. Springer (2012)
Childers, D.G., Wu, K.: Gender recognition from speech. Part II: Fine analysis. The Journal of the Acoustical Society of America 90(4), 1841–1856 (1991)
Li, Q.: An auditory-based transfrom for audio signal processing. In: IEEE Workshop on Applications of Signal Process. to Audio and Acous., WASPAA 2009, pp. 181–184. IEEE, New York (2009)
Li, Q., Huang, Y.: An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions. IEEE Trans. on Audio, Speech, and Lang. Process. 19(6), 1791–1801 (2011)
Bricker, P., Pruzansky, S.: Speaker recognition. In: Contemporary issues in experimental phonetics, pp. 295–326 (1976)
Cummins, F., Grimaldi, M., Leonard, T., Simko, J.: The CHAINS corpus: characterizing individual speakers. In: Proc. SPECOM, St. Petersburg, Russia, vol. 6, pp. 431–435 (2006)
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixture models. Digital Signal Processing 10(1), 19–41 (2000)
Peláez-Moreno, C., Gallardo-AntolÃn, A., DÃaz-de MarÃa, F.: Recognizing Over IP: A robust front-end for speech recognition on the world wide web. IEEE Trans. on Multimedia 3(2), 209–218 (2001)
Martin, A., Doddington, G., Kamm, T., Ordowski, M., Przybocki, M.: The DET curve in assessment of detection task performance. In: Euro Conf. Speech Process. Tech., Rhodes, Greece, pp. 1895–1898 (1997)
Fan, X., Hansen, J.H.: Speaker identification with whispered speech based on modified LFCC parameters and feature mapping. In: IEEE Int. Conf. on Acous., Speech and Signal Process., (ICASSP 2009), pp. 4553–4556. IEEE, Taipei (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Raikar, A., Gandhi, A., Patil, H.A. (2015). Combining Evidences from Mel Cepstral and Cochlear Cepstral Features for Speaker Recognition Using Whispered Speech. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_46
Download citation
DOI: https://doi.org/10.1007/978-3-319-24033-6_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24032-9
Online ISBN: 978-3-319-24033-6
eBook Packages: Computer ScienceComputer Science (R0)