Speaker identification using harmonic structure of LP-residual spectrum

Hayakawa, Shoji; Takeda, Kazuya; Itakura, Fumitada

doi:10.1007/BFb0016002

Shoji Hayakawa¹,
Kazuya Takeda¹ &
Fumitada Itakura¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1206))

Included in the following conference series:

International Conference on Audio- and Video-Based Biometric Person Authentication

2448 Accesses
10 Citations

Abstract

The harmonic structure of LP-residual spectrum is different in speakers. Therefore the harmonic structure may be useful for speaker recognition. In order to prove this hypothesis, Power Difference of Spectra in Subband (PDSS) is proposed as a new feature parameter to extract information of the harmonic structure of the linear prediction residual spectrum. VQ-based text-independent speaker identification experiments for 25 male and 25 female speakers are conducted to investigate the speaker identification ability of PDSS. Experimental results show that PDSS alone provides 66.9% maximal identification. In addition, it was found that the LPC cepstrum combined with PDSS results in a 41.2% reduction in identification errors compared with using only the LPC cepstrum. Moreover, a 52.4% reduction of identification errors over using only LPC cepstrum is attained by combining the LPC cepstrum with both delta cepstrum and PDSS. It is shown that PDSS can compensate for the LPC cepstrum and delta cepstrum for improving speaker identification performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Furui, S.: “Cepstral analysis technique for automatic speaker verification”, IEEE Trans. Acoust., & Speech, Signal Process., ASSP-29, No.2, pp.254–272 (1981).
Google Scholar
Gray, A. H. Jr. and Markel, J. D.: “A spectral-flatness measure for studying the autocorrelation method of linear prediction of speech analysis”, IEEE Trans.Acoust.,Speech, & Signal Process. ASSP-22, No.3, pp.207–217 (1974).
Google Scholar
He, J., Liu, L. and Palm, G.: “On the use of features from prediction residual signals in speaker identification,” ESCA Proc. EUROSPEECH, pp.313–316, (1995).
Google Scholar
Itakura, F. and Saito, S.: “Analysis synthesis telephony based upon the maximum likelihood method,” Reports of 6th Int. Cong. Acoust., ed. by Y. Kohasi, C-5-5, pp. 17–20 (1968).
Google Scholar
Kashiwagi, H., Nakamura, S. and Takanashi, M.: “Speaker identification by spectral envelope of linear prediction residual”, IECE Trans. A J68-A, No.7, pp.702–703 (1985). (in Japanese)
Google Scholar
Linde, Y., Buzo, A. and Gray, R. M.: “An algorithm for vector quantizer design”, IEEE Trans. Comm. COM-28, No.1, pp.84–95 (1980).
Google Scholar
Makhoul, J.: “Linear prediction: A tutorial review”, Proc. of IEEE. 63, No.4, pp.561–580 (1975).
Google Scholar
Markel, J. D. and Gray, A. H. Jr.: Linear prediction of speech, Springer-Verlag (1976).
Google Scholar
Matsui, T. and Furui, S.: “Text-independent speaker recognition using vocal tract and pitch information”, Proc. ICSLP, Vol.1, pp.137–140 (1990).
Google Scholar
Rosenberg, A. E. and Soong, F. K.: “Recent Research in Automatic Speaker Recognition”, Advances in Speech Signal Processing, ed.by S. Furui and M. M. Sondhi, pp.701–738, Marcel Dekker, New York, (1992).
Google Scholar
Soong, F. K. and Rosenberg, A. E.: “On the use of instantaneous and transitional spectral information in speaker recognition”, IEEE Trans. Acoust., & Speech, Signal Process., ASSP-36, No.6, pp.871–879 (1988).
Google Scholar
Thévenaz, P. and Hügli, H.: “Usefulness of the LPC-residue in text-independent speaker verification”, Speech Communication, 17, pp. 145–157 (1995).
Google Scholar
Tohkura, Y.: “A weighted cepstral distance measure for speech recognition”, IEEE Trans. Acoust., & Speech, Signal Process., ASSP-35, No.10, pp.1414–1422 (1987).
Google Scholar

Download references

Author information

Authors and Affiliations

Nagoya University, Furo-cho 1 Chikusa-ku, 464-01, Nagoya, Japan
Shoji Hayakawa, Kazuya Takeda & Fumitada Itakura

Authors

Shoji Hayakawa
View author publications
You can also search for this author in PubMed Google Scholar
Kazuya Takeda
View author publications
You can also search for this author in PubMed Google Scholar
Fumitada Itakura
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Josef Bigün Gérard Chollet Gunilla Borgefors

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hayakawa, S., Takeda, K., Itakura, F. (1997). Speaker identification using harmonic structure of LP-residual spectrum. In: Bigün, J., Chollet, G., Borgefors, G. (eds) Audio- and Video-based Biometric Person Authentication. AVBPA 1997. Lecture Notes in Computer Science, vol 1206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0016002

Download citation

DOI: https://doi.org/10.1007/BFb0016002
Published: 10 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-62660-2
Online ISBN: 978-3-540-68425-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics