Abstract
The harmonic structure of LP-residual spectrum is different in speakers. Therefore the harmonic structure may be useful for speaker recognition. In order to prove this hypothesis, Power Difference of Spectra in Subband (PDSS) is proposed as a new feature parameter to extract information of the harmonic structure of the linear prediction residual spectrum. VQ-based text-independent speaker identification experiments for 25 male and 25 female speakers are conducted to investigate the speaker identification ability of PDSS. Experimental results show that PDSS alone provides 66.9% maximal identification. In addition, it was found that the LPC cepstrum combined with PDSS results in a 41.2% reduction in identification errors compared with using only the LPC cepstrum. Moreover, a 52.4% reduction of identification errors over using only LPC cepstrum is attained by combining the LPC cepstrum with both delta cepstrum and PDSS. It is shown that PDSS can compensate for the LPC cepstrum and delta cepstrum for improving speaker identification performance.
Preview
Unable to display preview. Download preview PDF.
References
Furui, S.: “Cepstral analysis technique for automatic speaker verification”, IEEE Trans. Acoust., & Speech, Signal Process., ASSP-29, No.2, pp.254–272 (1981).
Gray, A. H. Jr. and Markel, J. D.: “A spectral-flatness measure for studying the autocorrelation method of linear prediction of speech analysis”, IEEE Trans.Acoust.,Speech, & Signal Process. ASSP-22, No.3, pp.207–217 (1974).
He, J., Liu, L. and Palm, G.: “On the use of features from prediction residual signals in speaker identification,” ESCA Proc. EUROSPEECH, pp.313–316, (1995).
Itakura, F. and Saito, S.: “Analysis synthesis telephony based upon the maximum likelihood method,” Reports of 6th Int. Cong. Acoust., ed. by Y. Kohasi, C-5-5, pp. 17–20 (1968).
Kashiwagi, H., Nakamura, S. and Takanashi, M.: “Speaker identification by spectral envelope of linear prediction residual”, IECE Trans. A J68-A, No.7, pp.702–703 (1985). (in Japanese)
Linde, Y., Buzo, A. and Gray, R. M.: “An algorithm for vector quantizer design”, IEEE Trans. Comm. COM-28, No.1, pp.84–95 (1980).
Makhoul, J.: “Linear prediction: A tutorial review”, Proc. of IEEE. 63, No.4, pp.561–580 (1975).
Markel, J. D. and Gray, A. H. Jr.: Linear prediction of speech, Springer-Verlag (1976).
Matsui, T. and Furui, S.: “Text-independent speaker recognition using vocal tract and pitch information”, Proc. ICSLP, Vol.1, pp.137–140 (1990).
Rosenberg, A. E. and Soong, F. K.: “Recent Research in Automatic Speaker Recognition”, Advances in Speech Signal Processing, ed.by S. Furui and M. M. Sondhi, pp.701–738, Marcel Dekker, New York, (1992).
Soong, F. K. and Rosenberg, A. E.: “On the use of instantaneous and transitional spectral information in speaker recognition”, IEEE Trans. Acoust., & Speech, Signal Process., ASSP-36, No.6, pp.871–879 (1988).
Thévenaz, P. and Hügli, H.: “Usefulness of the LPC-residue in text-independent speaker verification”, Speech Communication, 17, pp. 145–157 (1995).
Tohkura, Y.: “A weighted cepstral distance measure for speech recognition”, IEEE Trans. Acoust., & Speech, Signal Process., ASSP-35, No.10, pp.1414–1422 (1987).
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hayakawa, S., Takeda, K., Itakura, F. (1997). Speaker identification using harmonic structure of LP-residual spectrum. In: Bigün, J., Chollet, G., Borgefors, G. (eds) Audio- and Video-based Biometric Person Authentication. AVBPA 1997. Lecture Notes in Computer Science, vol 1206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0016002
Download citation
DOI: https://doi.org/10.1007/BFb0016002
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-62660-2
Online ISBN: 978-3-540-68425-1
eBook Packages: Springer Book Archive