Abstract
In this paper, we propose an approach for the analysis and detection of acoustic events in speech signals using the Bessel series expansion. The acoustic events analyzed are the voice onset time (VOT) and the glottal closure instants (GCIs). The hypothesis is that the Bessel functions with their damped sinusoid-like basis functions are better suited for representing the speech signals than the sinusoidal basis functions used in the conventional Fourier representation. The speech signal is band-pass filtered by choosing the appropriate range of Bessel coefficients to obtain a narrow-band signal, which is decomposed further into amplitude modulated (AM) and frequency modulated (FM) components. The discrete energy separation algorithm (DESA) is used to compute the amplitude envelope (AE) of the narrow-band AM-FM signal. Events such as the consonant and vowel beginnings in an unvoiced stop consonant vowel (SCV) and the GCIs are derived by processing the AE of the signal. The proposed approach for the detection of the VOT using the Bessel expansion is shown to perform better than the conventional Fourier representation. The performance of the proposed GCI detection method using the Bessel series expansion is compared against some of the existing methods for various noise environments and signal-to-noise ratios.
Similar content being viewed by others
References
M. Brookes, P.A. Naylor, J. Gundnason, A quantitative assessment of group delay method for identifying glottal closure in voiced speech. IEEE Trans. Audio Speech Lang. Process. 14(2), 456–466 (2006)
C.S. Chen, K. Gopalan, P. Mitra, Speech signal analysis and synthesis via Fourier–Bessel representation, in Proc. Inter. Conf. Acoust. Speech and Signal Processing (ICASSP) (1985), pp. 497–500
S. Das, J.H.L. Hansen, Detection of voice onset time (VOT) for unvoiced stops (/k/, /t/, /p/) using the Teager energy operator (TEO) for automatic detection of accented English, in Proc. 6th Nordic Signal Processing Symposium (2004), pp. 344–347
K. Gopalan, T.R. Anderson, E.J. Cupples, A comparison of speaker identification results using features based on cepstrum and Fourier–Bessel expansion. IEEE Trans. Speech Audio Process. 7(3), 289–294 (1999)
K. Gopalan, Speech coding using Fourier–Bessel expansion of speech signals, in Proc. 27th Annu. Conf. IEE Industrial Electronics Society, vol. 3 (2001), pp. 2199–2203
F.S. Gurgen, C.S. Chen, Speech enhancement by Fourier–Bessel coefficients of speech and noise. Commun. Speech Vis., IEE Proc. I 137(5), 290–294 (1990)
J.F. Kaiser, On a simple algorithm to calculate the energy of a signal, in Proc. Inter. Conf. Acoust. Speech and Signal Processing (ICASSP) (1990), pp. 381–384
L. Kaushik, D. O’Saughnessy, A novel method for epoch extraction from speech signals, in Proc. Interspeech (2009), pp. 2883–2886
P.A. Keating, J.R. Westbury, K.N. Stevens, Mechanisms of stop-consonant release for different places of articulation. J. Acoust. Soc. Am. 67, 93 (1980)
J. Kominek, A. Black, The CMU Arctic speech databases, in Proc. 5th ISCA Speech Synthesis Workshop (2004), pp. 223–234
A.K. Krishnamurthy, Glottal source estimation using a sum-of-exponential model. IEEE Trans. Acoust. Speech Signal Process. 40(3), 682–686 (1992)
P. Ladefoged, A Course in Phonetics, 3rd edn. (Harcourt Brace College, Fort Worth, 1993)
P. Maragos, J.F. Kaiser, T.F. Quatieri, Energy separation in signal modulation with application to speech analysis. Digit. Signal Process. 41(10), 3024–3051 (1993)
K.S.R. Murthy, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)
J.I. Navarro-Mesa, E. Lleida-Solano, A. Moreno-Bilbao, A new method for epoch detection based on the Cohen’s class of time-frequency representations. IEEE Signal Process. Lett. 8(8), 225–227 (2001)
A. Nayeemulla Khan, S.V. Gangashetty, S. Rajendran, Speech database for Indian languages—a preliminary study, in Proc. Int. Conf. Natural Language Processing, Mumbai, India (2002), pp. 295–301
P.A. Naylor, A. Kounoudes, J. Gundnason, M. Brookes, Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Trans. Audio Speech Lang. Process. 15(1), 34–43 (2007)
R.B. Pachori, P. Sircar, Analysis of multicomponent AM-FM signals using FB-DESA method. Digit. Signal Process. 20, 42–62 (2010)
C. Prakash, N. Dhananjaya, S.V. Gangashetty, Bessel features for detection of voice onset time using AM-FM signal, in Proc. 2011 18th Int. Conf. on Systems, Signal and Image Process. (IWSSIP-2011) (2011), pp. 139–142
C. Prakash, S.V. Gangashetty, Fourier–Bessel cepstral coefficients for robust speech recognition, in Proc. Inter. Conf. Signal Processing and Communication (SPCOM) (2012), pp. 1–5
C. Prakash, N. Dhananjaya, S.V. Gangashetty, Detection of glottal closure instants from Bessel features using AM-FM signal, in Proc. 18th Int. Conf. on Systems, Signal and Image Process. (IWSSIP-2011) (2011), pp. 143–146
C. Prakash, N. Dhananjaya, S.V. Gangashetty, Exploring Bessel features for detection of glottal closure instants, in Proc. Interspeech (2011), pp. 1985–1988
K.S. Rao, S.R.M. Prasanna, B. Yegnanarayana, Determination of instants of significant excitation in speech using Hilbert envelope and group delay function. IEEE Signal Process. Lett. 14(10), 762–765 (2007)
J. Schroeder, Signal processing via Fourier–Bessel series expansion. Digit. Signal Process. 3, 112–124 (1993)
D.O. Shaughnessy, in Speech Communications Human and Machine, 2nd edn. (Wiley/IEEE, New York, 1999)
K. Sjolander, J. Beskow, Wavesurfer—an open source speech tool, in Proc. Int. Conf. Spoken Language Processing, Beijing, China (2000), pp. 464–467
R. Smiths, B. Yegnanarayana, Determination of instants of significant excitation in speech using group delay functions. IEEE Trans. Speech Audio Process. 3(5), 325–333 (1995)
K.N. Stevens, Acoustic Phonetics (MIT, Cambridge, 1999)
A. Varga, H.J.M. Steeneken, Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993)
B. Yegnanarayana, S.V. Gangashetty, Machine learning for speech recognition—an illustration of phonetic engine using hidden Markov models, in Proc. Inter. Conf. Frontiers of Interface Between Statistics and Science (2010), pp. 319–328
Acknowledgements
The authors would like to thank the Department of Information Technology (DIT), Government of India, and the Defense Research and Development Organization (DRDO), Government of India, for supporting this activity through sponsored research projects. The second author would also like to thank The Academy of Finland (Finnish Centre of Excellence in Computational Inference Research COIN, 251170), and the European community’s seventh framework programme (FP7/2007–2013) under grant agreement no. 287678 (Simple4All) for supporting his stay in Finland as a postdoctoral researcher.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Prakash, C., Gowda, D.N. & Gangashetty, S.V. Analysis of Acoustic Events in Speech Signals Using Bessel Series Expansion. Circuits Syst Signal Process 32, 2915–2938 (2013). https://doi.org/10.1007/s00034-013-9596-1
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-013-9596-1