Skip to main content

Generalized Recognition of Sound Events: Approaches and Applications

  • Chapter
Multimedia Services in Intelligent Environments

Part of the book series: Studies in Computational Intelligence ((SCI,volume 120))

Summary

This chapter surveys the contemporary approaches of automatic sound recognition and discusses the benefits stemming from real-world applications of this technology. We identify the common aspects and subtle differences among these diverse application areas and review state-of-the-art systems. In this context we project that there is much space for knowledge transfer between the different subfields of sound classification, which seem to evolve independently while achieving different states of maturity. Particular emphasis is given to lessons learned from the speech recognition paradigm, which together with speaker recognition were among the first applications of sound classification that reached the status of launching commercial products at a large climax. Special attention is paid to new emerging applications such as environmental monitoring and bioacoustic identification and applications to music which have already started altering our everyday life as we once knew it.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Deng, L., O’Shaughnessy, D., Speech Processing: A Dynamic and Optimization-Oriented Approach, Marcel Dekker, New York, 2003.

    Google Scholar 

  2. Garces, M., Hetzer, C., Merrifield, M., Willis, M., Aucan, J., Observations of surf infrasound in Hawai’I, In Geophysical Research Letters, pp. 2264-2267,2003.

    Google Scholar 

  3. Auckland, D.W., McGrail, A.J., Smith, C.D., Varlow, B.R., Zhao, J., Zhu, D., The application of ultrasound to the inspection of insulation, In Proceedings of the IEEE 5th International Conference on Conduction and Breakdown in Solid Dielectrics, pp. 590-594, 1995.

    Google Scholar 

  4. Höge, H., Draxler, C., Van den Heuvel, H., Johansen, F.T., Sanders, E., Tropf, H.S., SpeechDat multilingual speech databases for teleservices: across the finish line, In Proceedings of the Eurospeech’99, Budapest, vol. 6, pp. 2699-2702, 1999.

    Google Scholar 

  5. Benyassine, A., Shlomot, E., Su, H.-Y., ITU recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications, In IEEE Communications Magazine, pp. 64-73, 1997.

    Google Scholar 

  6. Sohn, J., Kim, N.S., Sung, W., A statistical model-based voice activity detec-tion, In IEEE Signal Processing Letters, vol. 6, pp. 1-3, 1999.

    Article  Google Scholar 

  7. Cho, Y.D., Kondoz, A., Analysis and improvement of a statistical model-based voice activity detector, In IEEE Signal Processing Letters, vol. 8, pp. 276-278,2001.

    Article  Google Scholar 

  8. Chollet, G., Automatic Speech and Speaker Recognition: Overview, Current Issues and Perspectives, In Keller, E. (Ed.), Fundamentals of Speech Synthesis and Speech Recognition. Basic Concepts, State of the Art and Future Chal-lenges. Chichester, Wiley, pp. 129-148, 1994.

    Google Scholar 

  9. Zue, V., Cole, R., Ward, W., Speech Recognition, In Cole, R.A., Mariani, J., Uszkoreit, H., Zaenen, A., Zue, V. (Eds.), Survey of the State of the Art in Hu-man Language Technology, Cambridge, Cambridge University Press, pp. 4-10, 1997.

    Google Scholar 

  10. Reynolds, D.A., Rose, R.C., Robust text-independent speaker identification using Gaussian mixture speaker models, In IEEE Transactions on Speech and Audio Processing, vol. 3, no. 1, pp. 72-83, January 1995.

    Article  Google Scholar 

  11. Furui, S., Speaker Recognition, In Cole, R. (Ed.), Survey of the State of the Art in Human Language Technology, Chapter 1.7, Oregon Health & Science U., 1996.

    Google Scholar 

  12. Gish, H., Schmidt, M., Text-idependent speaker identification, In IEEE Signal Processing Magazine, vol. 11, no. 4, pp.18-32, October 1994.

    Article  Google Scholar 

  13. Zervas, P., Mporas, I., Fakotakis, N., Kokkinakis, G., Evaluating intonational features for emotion recognition from speech, In International Journal of Ar-tificial Intelligence Tools, 2007.

    Google Scholar 

  14. Kwon, O., Chan, K., Hao, J., Lee, T., Emotion recognition by speech signals, In Proceedings of the Eurospeech’03, Geneva, pp. 125-128, 2003.

    Google Scholar 

  15. Muthusamy, Y., Barnard, E., Cole, R., Reviewing automatic language recog-nition, In IEEE Signal Processing Magazine, pp. 33-41, October 1994.

    Google Scholar 

  16. Hansen, J., Arslan, L., Foreign accent classification using source genera-tor based prosodic features, In Proceedings of the ICASSP’95, Detroit, MI, pp. 836-839, 1995.

    Google Scholar 

  17. Hansen, J.H.L., Gavidia-Ceballos, L., Kaiser, J.F., A nonlinear based speech feature analysis method with application to vocal fold pathology assessment, In IEEE Transactions on Biomedical Engineering, vol. 45, no. 3, pp. 300-313, March 1998.

    Article  Google Scholar 

  18. Gavidia-Ceballos, L., Hansen, J.H.L., direct speech feature estimation using an iterative EM algorithm for vocal cancer detection, In IEEE Transactions on Biomedical Engineering, vol. 43, no. 4, pp. 373-383, April 1996.

    Article  Google Scholar 

  19. Tzanetakis, G., Cook, P., Musical Genre classification of audio signals, In IEEE Transactions on Speech and Audio Processing, vol. 10, no. 5, July 2002.

    Article  Google Scholar 

  20. Gouyon, F., Dixon, S., Pampalk, E., Widmer, G., Evaluating rhythmic descrip-tors for musical genre classification, In Proceedings of the AES 25th Interna-tional Conference, London, United Kingdom, June 17-19, 2004.

    Google Scholar 

  21. FitzGerald, D., Coyle, E., Lawlor, B., Sub-band independent subspace analysis for drum transcription, In Proceedings of the DAFX’02, pp. 65-69, 2002.

    Google Scholar 

  22. Klapuri, A., Davy, M., (Eds.), Signal Processing Methods for Music Transcrip-tion, Springer, Berlin Heidelberg New York, 2006.

    Google Scholar 

  23. Widmer, G. (Ed.), Special Issue on Machine Learning in Music, In Machine Learning, vol. 65, no. 2-3, December 2006.

    Google Scholar 

  24. Eggink, J., Brown, G.J., Instrument recognition in accompanied sonatas and concertos, In Proceedings of the ICASSP’04, Montreal, Canada, pp. 217-220, May 2004.

    Google Scholar 

  25. Livshin, A.A., Rodet, X., Musical instrument identification in continuous recordings, In Proceedings of the DAFX’04, Naples, Italy, October 5-8, 2004.

    Google Scholar 

  26. Peeters, G., Automatic classification of large musical instrument databases using hierarchical classifiers with inertia ratio maximization, In Proceedings of the AES 115th convention, New York, USA, October 10-13, 2003.

    Google Scholar 

  27. Eggink, J., Brown, G.J., A missing feature approach to instrument identifi-cation in polyphonic music, In Proceedings of the ICASSP’03, Hong Kong, pp. 553-556, April 2003.

    Google Scholar 

  28. Liu, M., Wan, C., Feature selection for automatic classification of musical in-strument sounds, In Proceedings of the 1st ACM/IEEE-CS Joint conference on Digital libraries, pp. 247-248, 2001.

    Google Scholar 

  29. Essid, S., Richard, G., David, B., Efficient musical instrument recognition on solo performance music using basic features, In Proceedings of the AES 25th International Conference, London, UK, June 2004.

    Google Scholar 

  30. Herrera, P., Yeterian, A., Gouyon, F., Automatic classification of drum sounds: a comparison of feature selection methods and classification techniques, In Proceedings of Second International Conference on Music and Artificial Intelligence, Edinburgh, Scotland, 2002.

    Google Scholar 

  31. Eronen, A., Musical instrument recognition using ICA-based transform of fea-tures and discriminatively trained HMMs, In Proceedings of the Seventh Inter-national Symposium on Signal Processing and it’s Applications, pp. 133-136, July 2003.

    Google Scholar 

  32. Eronen, A., Klapuri, A., Musical instrument recognition using cepstral coeffi- cients and temporal features, In Proceedings of the ICASSP’00, pp. 753-756, 2000.

    Google Scholar 

  33. Brown, J.C., Houix, O., McAdams, S., Feature dependence in the automatic identification of musical woodwind instruments, In Journal of the Acoustical Society of America, vol. 109, no. 3, pp. 1064-1072, March 2000.

    Article  Google Scholar 

  34. Herrera, P., Peeters, G., Dubnov, S., Automatic classification of musical in-strument sounds, New Music Research, vol. 32, no. 1, 2003.

    Google Scholar 

  35. Peeters, G., Rodet, X., Automatically selecting signal descriptors for sound classification. In Proceedings of the ICMC’02, Goteborg, Sweden, September 2002.

    Google Scholar 

  36. Wold, T., Blum, D., Wheaton, J., Content-based classification, search, and retrieval of audio, In Proceedings of the IEEE Multimedia, vol.3, no.3, pp. 2736, 1996.

    Google Scholar 

  37. Slaney, M., Mixtures of probability experts for audio retrieval and indexing, In Proceedings of the IEEE International Conference on Multimedia and Expo, Lausanne, Switzerland, vol. 1, pp. 345-348, August 2002.

    Google Scholar 

  38. Berenzweig, A., Ellis, D.P.W., Lawrence, S., Anchor space for classification and similarity measurement of music, In Proceedings of the IEEE International Conference on Multimedia and Expo, vol. 1, pp. 29-32, 2003.

    Google Scholar 

  39. Drosopoulos, S., Claridge M., Insect sounds and communication: physiology, behaviour, ecology, and evolution, Contemporary Topics in Entomology, CRC Press, 2005.

    Google Scholar 

  40. Helweg, D.A., Automatic detection and species identification of blue and fin whale calls, In Bioacoustics, vol. 13, p. 96, 2002.

    Google Scholar 

  41. Hennig, R.M., Acoustic feature extraction by cross-correlation in crickets, In Journal of Comparative Physiology. A, Neuroethology, Sensory, Neural, and Behavioral Physiology, vol. 189, pp. 589-598, 2003.

    Article  Google Scholar 

  42. Oba, T., Application of automated bioacoustic identification in environmental education and assessment, In Anais da Academia Brasileira de Cincias, vol. 76, pp. 446-451, 2004.

    Google Scholar 

  43. Potamitis, I., Ganchev, T., Fakotakis, N., Automatic acoustic identifica- tion of insects inspired by the speaker recognition paradigm, In Proceedings of the Interspeech-ICSLP’06, Pittsburg PA, USA, paper 1505-Wed3CaP.13, September 17-21, 2006.

    Google Scholar 

  44. Skowronski, M., Harris, J., Acoustic detection and classification of microchi-roptera using machine learning: Lessons learned from automatic speech recog-nition, In Journal of the Acoustical Society of America, vol. 119, pp. 1817-1833, 2006.

    Article  Google Scholar 

  45. Alexander, R., Sound production and associated behavior in insects, In The Ohio Journal of Science, vol. 57, no. 2, pp. 101-113, 1957.

    Google Scholar 

  46. Bennett-Clark, H., Resonators in insect sound production: how insects produce loud pure-tone songs, In Journal of Experimental Biology, vol. 202, pp. 3347-3357,1999. 3 Generalized Recognition of Sound Events: Approaches and Applications73

    Google Scholar 

  47. Martin, K., Sound-source recognition: a theory and computational model, Ph.D. Thesis, MIT, Media Lab, 1999.

    Google Scholar 

  48. Ashiya, T., Hagiwara, M., Nakagawa, M., IOSES: An indoor observation sys-tem based on environmental sounds recognition using a neural network, In Transactions of the Institute of Electrical Engineers of Japan, vol. 116-C, no. 3, pp. 341-349, 1996.

    Google Scholar 

  49. Cowling, M., Sitte, R., Comparison of techniques for environmental sound recognition, In Pattern Recognition Letters, vo1. 24, no. 15, pp. 2895-2907, 2003.

    Article  Google Scholar 

  50. Goldhor, R.S., Recognition of environmental sounds, In Proceedings of the ICASSP93, vol. 1, pp. 149-152, 1993.

    Google Scholar 

  51. Arrigoni, J.E., An evaluation of amphibian monitoring approaches in the maya forest, Chapter 3: An assessment of the vocalization survey method for mon-itoring anuran populations in the Maya Forest, Master thesis, pp. 21-42, February, 2003.

    Google Scholar 

  52. Lee, C.-H., Chou, C.-H., Han, C.-C., Huang, R.-Z., Automatic recognition of animal vocalizations using averaged MFCC and linear discriminant analysis, In Pattern Recognition Letters, vol. 27, pp. 93-101, 2006.

    Article  Google Scholar 

  53. Mitrovic, D., Zeppelzauer, M., Discrimination and Retrieval of Animal Sounds, In Proceedings of the IEEE Multimedia Modelling Conference, Beijing, China, pp. 339-343, 2006.

    Google Scholar 

  54. Gaston, K., O’Neill, M.A., Automated species identification - why not? In Philosophical Transactions-Royal Society of London. Biological Sciences, vol. 359, no. 1444, pp. 655-667, 2004.

    Article  Google Scholar 

  55. Watson, A.T., O’Neill, M.A., Kitching, I.J., A qualitative study investigating automated identification of living macrolepidoptera using the Digital Auto-mated Identification SYstem (DAISY), In Systematics and Biodiversity, vol. 1, no. 1, 2003.

    Google Scholar 

  56. Chesmore, E., Application of time domain signal coding and artificial neural networks to passive acoustical identification of animals, In Applied Acoustics, vol. 62, pp. 1359-1374, 2001.

    Article  Google Scholar 

  57. Dietrich, C., Temporal sensor fusion for the classification of bioacoustic time se-ries, PhD thesis, University of Ulm, Department of Neural Information Process-ing, 2004.

    Google Scholar 

  58. Guo, Y.B., Ammula, S.C., Real-time acoustic emission monitoring for surface damage in hard machining, In International Journal of Machine Tools and Manufacture, vol. 45, pp. 1622-1627, 2005.

    Google Scholar 

  59. SrinivasaPai, P., Ramakrishna Rao, P.K., Acoustic emission analysis for tool wear monitoring in face milling, In International Journal Production Research, vol. 40, no. 5, pp. 1081-1093, 2002.

    Article  Google Scholar 

  60. Dornfeld, D.A., Manufacturing process monitoring and analysis using acoustic emission, In Journal Acoustic Emission, vol. 4, no. 2-3, pp. 123-126, 1985.

    Google Scholar 

  61. Dimla, D.E., Jr., Lister, P.M., Leighton, N.J., Neural network solutions to the tool condition monitoring problem in metal cutting. A critical review of methods, In International Journal of Machine Tools Manufacturing, vol. 37, no. 9, pp. 1219-1240, 1997.

    Article  Google Scholar 

  62. Diniz, A.E., Liu, J.J., Dornfeld, D.A., Correlating tool life, tool wear and sur-face roughness by monitoring acoustic emission in turning, In Wear, vol. 152, pp. 395-407, 1992.

    Article  Google Scholar 

  63. Diei, E.N., Dornfeld, D.A., Acoustic emission sensing of tool wear in face milling, In Transactions of ASME, Journal of Engineering for Industry, vol. 109, pp. 234-240, 1987.

    Article  Google Scholar 

  64. Kannatey-Asibu, E., Jr., Dornfeld, D.A., Quantitative relationships for acoustic emission from orthogonal metal cutting, In Transactions of ASME, Journal of Engineering for Industry, vol. 103, pp. 330-339, 1981.

    Article  Google Scholar 

  65. Carolan, T.A., Kidd, S.R., Hand, D.P., Wilcox, S.J., Wilkinson, P., Barton, J.S., Jones, J.D.C., Reuben, R.L., Acoustic emission monitoring of tool wear during the face milling of steels and aluminium alloys using a fiber optic sensor energy analysis, In Proceedings of the Institution of Mechanical Engineers, 211(B), pp. 299-309, 1997.

    Article  Google Scholar 

  66. Iwata, K., Moriwaki, T., An application of acoustic emission measurements to in process sensing of tool wear, In Annals of the CIRP, vol. 25, no. 1, pp. 21-26, 1977.

    Google Scholar 

  67. Sampath, A., Vajpayee, S., Tool health monitoring using acoustic emission, In International Journal of Production Research, vol. 25, no. 5, pp. 703-719, 1987.

    Article  Google Scholar 

  68. Lister, P.M., Barrow, G., Tool condition monitoring systems, In Proceedings of the 26th International Machine Tool Design and Research Conference, pp. 271-288,1986.

    Google Scholar 

  69. Inasaki, I., Application of acoustic emission sensor for monitoring machining processes, In Ultrasonics, vol. 36, pp. 273-281, 1998.

    Article  Google Scholar 

  70. Eronen, A.J., Peltonen, V.T., Tuomi, J.T., Klapuri, A.P., Fagerlund, S., Sorsa, T., Lorho, G., Huopaniemi, J., Audio-Based Context Recognition, In IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 1, January 2006.

    Google Scholar 

  71. Gellersen, H.-W., Schmidt, A., Beigl, M., Adding some smartness to devices and everyday things, In Proceedings of the Third IEEE Workshop on Mobile Computing Systems and Applications, pp. 3-10, 2005.

    Google Scholar 

  72. Vemuri, S., Schmandt, C., Bender, W., Tellex, S., Lassey, B., An audio-based personal memory aid, In Proceedings of the 6th International Conference Ubiq-uitous Computing, Ubicomp’04, pp. 400-417, 2004.

    Google Scholar 

  73. Chu, S., Narayanan, S., Jay Kuo, C.-C., Content analysis for acoustic en-vironment classification in mobile robots, In Proceedings of AAAI 2006 Fall Symposium, Aurally Informed Performance: Integrating Machine Listening and Auditory Presentation in Robotic Systems, Arlington, VA, October 2006.

    Google Scholar 

  74. Clarkson, B., Sawhney, N., Pentland, A., Auditory context awareness via wear-able computing, In Proceedings of the Workshop on Perceptual User Interfaces, November 1998.

    Google Scholar 

  75. Képesi, M., Weruaga, L., Adaptive chirp-based time-frequency analysis of speech signals, In Speech Communication, vol. 48, no. 5, pp. 474-492, 2006.

    Article  Google Scholar 

  76. Gopalan, K., Speech modification by selective fourier-bessel series expansion of speech signals, In IEEE Pacific Rim Conference on Communications, Com-puters and Signal Processing, pp. 388-392, 1999.

    Google Scholar 

  77. Irino, T., Patterson, R.D., Stabilised wavelet Mellin transform: An auditory strategy for normalising sound-source size, In Proceedings of the Eurospeech ’99, Budapest, pp. 1899-1902, Hungary, 1999.

    Google Scholar 

  78. Wolfe, P.J., Godsill, S.J., Ng, W.-J., Bayesian variable selection and regularisa-tion for time-frequency surface estimation, In Journal of The Royal Statistical Society Series B, Royal Statistical Society, vol. 66, no. 3, pp. 575-589, 2004.

    Article  MATH  MathSciNet  Google Scholar 

  79. Hong, L., Rosca, J., Balan, R., Bayesian single channel speech enhancement exploiting sparseness in the ICA domain, In Proceedings of the EUSIPCO 2004, Vienna, Austria, September 2004.

    Google Scholar 

  80. Mossing, J.C., Tuthill, T.A., Reduced interference distributions for the detec-tion andclassification of outside sound source acoustic emissions, In Proceedings of the ICASSP’96, vol. 5, pp. 2758-2761, 1996.

    Google Scholar 

  81. Tzanetakis, G., Essl, G., Cook, P.R., Audio analysis using the discrete wavelet transform, In Proceedings of WSES International Conference, Acoustics and Music: Theory and Applications (AMTA), Skiathos, Greece, 2001.

    Google Scholar 

  82. Purat, M., Noll, P., Audio coding with a dynamic wavelet packet decomposition based on frequency-varying modulated lapped transforms, In Proceedings of the ICASSP’96, vol. 2, pp. 1021-1024, 1996.

    Google Scholar 

  83. Davis, S.B., Mermelstein, P., Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, In IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.28, no.4, pp. 357-366, 1980.

    Article  Google Scholar 

  84. Kim, H., Moreau, N., Sikora, T., Audio classification based on MPEG-7 spec-tral basis representations, In IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, pp. 716-725, 2004.

    Article  Google Scholar 

  85. Allamanche, E., Herre, J., Hellmuth, O., Fröba, B., Kastner, T., Cremer, M., Content-based identification of audio material using MPEG-7 low-level de-scription. In Proceedings of the International Conference on Music Information Retrieval, 2001.

    Google Scholar 

  86. Quackenbush, S., Lindsay, A., Overview of MPEG-7 audio, In IEEE Transac-tions on Circuits Systems for Video Technology, vol. 11, pp. 725-729, 2001.

    Article  Google Scholar 

  87. Peeters, G., McAdams, S., Herrera, P. Instrument sound description in the context of MPEG-7, In Proceedings of the International Conference on Music and Computers (ICMC), Berlin, Germany, 2000.

    Google Scholar 

  88. Kim, H., Sikora, T., Comparison of MPEG-7 audio spectrum projection fea-tures and MFCC applied to speaker recognition, sound classification and audio segmentation, In Proceedings of the ICASSP’04, vol. 5, pp. 925-928, 2004.

    Google Scholar 

  89. Casey, M., MPEG-7 sound recognition tools, In IEEE Transactions on Circuits and Systems for Video Technology, vol. 11, pp. 737-747, 2001.

    Article  Google Scholar 

  90. Xiong, Z., Radhakrishnan, R., Divakaran, A., Huang, S.T., Comparing MFCC and MPEG-7 audio features for feature extraction, Maximum Likelihood HMM and Entropic Prior HMM for sports audio classification, In Proceedings of the International Conference on Multimedia and Expo, vol. 3, pp. 397-400, 2003.

    Google Scholar 

  91. Haeb-Umbach, R., Ney, H., Linear discrimination analysis for improved large vocabulary continuous speech recognition, In Proceedings of the ICASSP’92, pp. 113-116, 1992.

    Google Scholar 

  92. Tokuhira, M., Ariki, Y., Effectiveness of KL-transformation in spectral delta expansion, In Proceedings of the Eurospeech’99, vol. 1, pp. 359-362, 1999.

    Google Scholar 

  93. Saul, L.K., Rahim, M.G., Maximum likelihood and minimum classification error factor analysis for automatic speech recognition, In IEEE Transactions on Speech and Audio Processing, vol. 8, no. 2, pp. 115-125, March 2000.

    Article  Google Scholar 

  94. Casey, M.A., Reduced-rank spectra and minimum-entropy priors as consis-tent and reliable cues for generalized sound recognition, In Proceedings of the Workshop on Consistent and Reliable Acoustic Cues for Sound Analysis, Eurospeech’01, Aalborg, Denmark, 2001.

    Google Scholar 

  95. Lee, T.-W., Jang, G.-J., The statistical structure of male and female speech signals, In Proceedings of the ICASSP’01, vol. 1, pp. 105-108, May 2001.

    Google Scholar 

  96. Eisele, T., Haeb-Umbach, R., Langmann, D., A comparative study of linear feature transformation techniques for automatic speech recognition, In Pro-ceedings of the ICSLP’96, pp. 252-255, 1996.

    Google Scholar 

  97. Battle, E., Nadeu, C., Fonollosa, J., Feature decorrelation methods in speech recognition. A comparative study, In Proceedings of the ICSLP’98, pp. 951-954,1998.

    Google Scholar 

  98. Bayes, T., An essay towards solving a problem in the doctrine of chances, In Philosophical Transactions of the Royal Society of London, vol. 53, pp. 370-418,1763.

    Article  Google Scholar 

  99. Fisher, R.A., The use of multiple measurements in taxonomic problems, In Annals of Eugenics, vol. 7, pp. 179-188, 1936.

    Google Scholar 

  100. Specht, D.F., Generation of polynomial discriminant functions for pattern recognition, In IEEE Transactions on Electronic Computers, vol. 16, pp. 308-319,1967.

    Article  MATH  Google Scholar 

  101. Lang, K.J., Hinton, G.E., A time delay neural network architecture for speech recognition, Technical Report CMU-cs-88-152, Carnegie Mellon University, Pittsburgh PA, 1988.

    Google Scholar 

  102. Jordan, M.I., Serial order: A parallel distributed processing approach, Institute for Cognitive Science, Report 8604, University of California, San Diego, 1986.

    Google Scholar 

  103. Elman, J.L., Finding structure in time, In Cognitive Science, vol. 14, pp. 179-211,1990.

    Article  Google Scholar 

  104. Rosenblatt, F., The perceptron: a probabilistic model for information stor- age and organization in the brain, In Psychological Review, vol. 65, pp. 386-408,1958.

    Article  MathSciNet  Google Scholar 

  105. Vapnik, V.N., The Nature of Statistical Learning Theory, Springer, 1995.

    Google Scholar 

  106. Specht, D.F., Probabilistic neural networks for classification, mapping, or as- sociative memory, In Proceedings of the IEEE Conference on Neural Networks, San Diego, vol. 1, pp. 525-532, July 1988.

    Article  Google Scholar 

  107. Hansen, L.P., Large sample properties of generalized method of moments esti- mation, In Econometrica, vol. 50, pp. 1029-1054, 1982.

    Article  MATH  Google Scholar 

  108. Baum, L.E., Petrie, T., Statistical inference for probabilistic functions of finite state markov chains, In Annals of Mathematical Statistics, vol. 37, pp. 1554-1563,1966.

    Article  MATH  MathSciNet  Google Scholar 

  109. Cover, T., Hart, P., Nearest neighbour pattern classification, In IEEE Trans- actions on Information Theory, vol. 13, pp. 21-27, 1967.

    Article  MATH  Google Scholar 

  110. Kohonen, T., Learning vector quantization for pattern recognition, Technical Report TKK-F-A601, Helsinki University of Technology, 1986.

    Google Scholar 

  111. Powell, M.J.D., Radial basis Functions for Multivariable Interpolation: A Re- view, In Mason, J., Cox, M. (Eds.), Algorithms for Approximation, Oxford, Clarendon Press, pp. 143-167, 1987.

    Google Scholar 

  112. Bengio, S., Mariethoz. J., Learning the decision function for speaker verifica- tion, Technical Report, IDIAP Research Report 00-40, IDIAP, January 2001.

    Google Scholar 

  113. Bourlard, H.A., Morgan, N., Connectionist speech recognition: A hybrid ap-proach, Kluwer, 1994.

    Google Scholar 

  114. Neto, J., Almeida, L., Hochberg, M., Martins, C., Nunes, L., Renals, S., Robinson, T., Speaker adaptation for hybrid HMM/ANN continuous speech recognition system, In Proceedings of the Eurospeech’95, pp. 2171-2174, 1995.

    Google Scholar 

  115. Bengio, Y., Frasconi, P., Input-output HMM’s for sequence processing, In IEEE Transactions on Neural Networks, vol. 7, no. 5, pp. 1231-1249, 1996.

    Article  Google Scholar 

  116. Setlur, A.R., Sukkar R.A., Jacob J., Correcting recognition errors via discrim-inative utterance verification, In Proceedings of ICSLP’96, Philadelphia, USA, vol. 2, pp. 602-605, 1996.

    Google Scholar 

  117. Ganchev, T., Tasoulis, D.K., Vrahatis, M.N., Fakotakis, N., Locally recur- rent probabilistic neural network for text-independent speaker verification, In Proceedings of the Eurospeech’03, Geneva, Switzerland, vol. 3, pp. 1673-1676, September 1-4, 2003.

    Google Scholar 

  118. Ganchev, T., Tasoulis, D.K., Vrahatis, M.N., Fakotakis, N., Generalized lo- cally recurrent probabilistic neural networks for text-independent speaker verification, In Proceedings of the ICASSP’04, Montreal, Quebec, Canada, vol. 1, pp. 41-44, May 17-21, 2004.

    Google Scholar 

  119. Ganchev, T., Tasoulis, D.K., Vrahatis, M.N., Fakotakis N., Generalized locally recurrent probabilistic neural networks with application to text-independent speaker verification, In Neurocomputing, vol. 70, no. 7-9, pp. 1424-1438, 2007.

    Article  Google Scholar 

  120. Liu, M., Wan, C., A study on content-based classification and retrieval of audio database, In International Database Engineering and Applications Symposium (IDEAS ’01), ISSN:1098-8068, p. 339, 2001.

    Google Scholar 

  121. Guo, X., Yan, Y., Xiao, Y.S., Xiao, S.-C., Heart sound recognition algorithm based on pnn for evaluating cardiac contractility change trend, In Journal of Biomedical Engineering, vol. 23, no. 5, 2006.

    Google Scholar 

  122. Barry, S.J., Dane1, A.D., Morice, A.H., Walmsley, A.D., The automatic recog- nition and counting of cough, In Cough, vol. 2, no. 8, 2006.

    Google Scholar 

  123. Chordia, P., Segmentation and recognition of tabla strokes, In Proceedings of the 6th International Conference on Music Information Retrieval, London, UK, 11-15 September, 2005.

    Google Scholar 

  124. Bolat, B., Kucuk, U., Musical sound recognition by active learning PNN, In Lecture Notes in Computer Science, vol. 4105/2006, Multimedia Content Representation, Classification and Security, ISSN:0302-9743, Springer, Berlin Heidelberg New York, 2006.

    Google Scholar 

  125. Kraft, F., Malkin, R., Schaaf, T., Waibel, A., Temporal ICA for classification of acoustic events in a kitchen environment, In Proceedings of the Interspeech’05, Lisbon, Portugal, 2005.

    Google Scholar 

  126. Ravindran, S., Anderson, D.V., Audio classification and scene recognition for hearing aids, In IEEE International Symposium on Circuits and Systems, ISCAS’05, vol. 2, pp. 860-863, 2005.

    Article  Google Scholar 

  127. Temko, A., Nadeu, C., Classification of acoustic events using SVM-based clustering schemes, In Pattern Recognition, ISSN:0031-3203, vol. 39, no. 4, pp. 682-694, April 2006.

    Article  MATH  Google Scholar 

  128. Dufaux, A., Besacier, L., Ansorge, M., Pellandini, F., Automatic sound detec- tion and recognition for noisy environment, In Proceedings of the EUSIPCO 2000, Tampere, Finland, 2000.

    Google Scholar 

  129. Yella, S., Gupta, N.K., Dougherty, M., Pattern recognition approach for the automatic classification of data from impact acoustics, In Proceedings of the AISC’2006, Palma De Mallorca, Spain, pp. 144-149, August 28-30, 2006.

    Google Scholar 

  130. Chu, S., Narayanan, S., Jay Kuo, C.-C., Matarić, M.J., Where am I? Scene recognition for mobile robots using audio features, In Proceedings of the ICME’06, pp. 885-888, 2006.

    Google Scholar 

  131. Essid, S., Classification of audio signals: machine recognition of musical instru- ments, Seminars, CNRS-LTCI, 2006.

    Google Scholar 

  132. Casey, M., General sound classification and similarity in MPEG-7, In Organised Sound, vol. 6, no. 2, pp. 153-164, 2001.

    Article  MathSciNet  Google Scholar 

  133. Casey, M., MPEG-7 sound recognition tools, In IEEE Transactions on Circuits and Systems for Video Technology, vol. 11, no. 6, pp. 737-747, 2001.

    Article  Google Scholar 

  134. Peltonen, V., Tuomi, J., Klapuri, A., Huopaniemi, J., Sorsa, T., Computational auditory scene recognition, In Proceedings of the ICASSP’02, vol. 2, pp. 1941-1944,2002.

    Google Scholar 

  135. Sitte, R., Willets, L., Non-speech environmental sound identification for surveil- lance using self-organizing-maps, In Proceeding of the SPPRA 2007, Innsbruck, Austria, February 14-16, 2007.

    Google Scholar 

  136. Harlow, C., Wang, Y., Acoustic accident detection system, In Journal In- telligent Transportation Systems, Taylor & Francis, ISSN:1024-8072, vol. 7, pp. 43-56, January 2002.

    MATH  Google Scholar 

  137. Yella, S., Gupta, N.K., Dougherty, M., Condition monitoring using pattern recognition techniques on data from acoustic emissions, In Proceedings of the ICMLA’06, pp. 3-9, 2006.

    Google Scholar 

  138. Toyoda, Y., Huang, J., Ding, S., Liu, Y., Environmental sound recognition by the instantaneous spectrum combined with the time pattern of power, In Proceedings of the 2nd IASTED International Conference on Neural Networks and Computational Intelligence, NCI 2004, pp. 169-172, 2004.

    Google Scholar 

  139. Coath, M., Denham, S.L., Robust sound classification through the representa-tion of similarity using response fields derived from stimuli during early expe-rience, In Biological Cybernetics, vol. 93, no. 1, pp. 22-30, July, 2005.

    Article  Google Scholar 

  140. Li, Y., Dorai, C., SVM-based audio classification for instructional video analy-sis, In Proceedings of the ICASSP’04, Montreal, Canada, vol. 5, pp. 897-900,2004.

    Google Scholar 

  141. Lin, C.-C., Chen, S.-H., Truong, T.-K., Chang, Y., Audio classification and categorization based on wavelets and support vector machine, In IEEE Trans-actions on Speech and Audio Processing, vol. 13, no. 5, September 2005.

    Google Scholar 

  142. Chen, L., Gunduz, S., Ozsu, M.T., Mixed type audio classification with support vector machine, In IEEE International Conference on Multimedia and Expo, ICME’06, pp. 781-784, July 2006.

    Google Scholar 

  143. McLachlan, G.J., Krishnan, T., The EM algorithm and extensions, Wiley Se- ries in Probability and Statistics, New York, Wiley, 1997.

    Google Scholar 

  144. Hartigan, J.A., Wong, M.A., A k-means clustering algorithm, In Applied Sta- tistics, vol. 28, no. 1, pp. 100-108, 1979.

    Article  MATH  Google Scholar 

  145. Meisel, W., Computer-Oriented Approaches To Pattern Recognition, Academic Press, New York, 1972.

    MATH  Google Scholar 

  146. Cain, B.J., Improved probabilistic neural network and its performance relative to the other models, In Proceedings of the SPIE, Applications of Artificial Neural Networks, vol. 1294, pp. 354-365, 1990.

    Google Scholar 

  147. Musavi, M., Kalantri, K., Ahmed, W., Improving the performance of proba- bilistic neural networks, In Proceedings of IEEE International Joint Conference on Neural Networks, Baltimore, MD, USA, vol. 1, pp. 595-600, June 7-11, 1992.

    Google Scholar 

  148. Abe, S., Support Vector Machines for Pattern Classification, Springer, Berlin Heidelberg New York, London, 2005.

    Google Scholar 

  149. Hansen, L.K., Salamon, P., Neural Network Ensembles, In IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, no. 10, pp. 993-1001, October 1990.

    Article  Google Scholar 

  150. Ho, T.K., Hull, J.J., Srihari, S.N., Decision combination in multiple classifier systems, In IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, no. 1, pp. 66-75, January 1994.

    Article  Google Scholar 

  151. Breiman, L., Bagging predictors, In Machine Learning, vol. 24, pp. 123-140, 1996.

    MATH  MathSciNet  Google Scholar 

  152. Dietterich, T., An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization, In Machine Learning, pp. 1-22, 1998.

    Google Scholar 

  153. Kittler, J., Hatef, M., Duin, R., Matas, J., On combining classifiers, In IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 3, pp. 226-239, 1998.

    Article  Google Scholar 

  154. Alkoot, F.M., Kittler, J., Experimental evaluation of expert fusion strategies, In Pattern Recognition Letters, vol. 20, no. 11, pp. 11-13, 1999.

    Article  Google Scholar 

  155. Kittler, J., Alkoot, F.M., Sum versus vote fusion in multiple classifier systems, In IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 1, pp. 110-115, 2003.

    Article  Google Scholar 

  156. Xu, L., Krzyzak, A., Suen, C.Y., Methods of combining multiple classifiers and their applications to handwriting recognition, In IEEE Transactions on Systems, Man, and Cybernetics, vol. 22, no. 3, pp. 418-435, 1992.

    Article  Google Scholar 

  157. Jordan, M.I., Jacobs, R.A., Hierarchical mixtures of experts and the EM algo-rithm, In Neural Computation, no. 6, pp. 181-214, 1994.

    Google Scholar 

  158. Hinton, G.E., Sallans, B., Ghahramani, Z., A Hierarchical Community of Experts, In Jordan, M.I.(Ed.), Learning in Graphical Models, Kluwer, pp. 479-494, 1998.

    Google Scholar 

  159. Dietterich, T., Ensamble Methods in Machine Learning, In Kittler, J., Rolli, F. (Eds.), Multiple Classifier Systems, pp. 1-15, 2000.

    Google Scholar 

  160. Ganchev, T., Tsopanoglou, A., Fakotakis, N., Kokkinakis, G., Probabilistic neural networks combined with GMMs for speaker recognition over telephone channels, In Proceedings of the DSP2002, Santorini, Greece, vol. 2, pp. 1081-1084, July 1-3, 2002.

    Google Scholar 

  161. Potamitis, I., Ganchev, T., Fakotakis, N., Automatic acoustic identification of crickets and cicadas, In Proceedings of the ISSPA’07, February 12-15, 2007.

    Google Scholar 

  162. Bishop, C., Pattern Recognition and Machine Learning, Springer, Berlin Heidelberg New York, 2006.

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Potamitis, I., Ganchev, T. (2008). Generalized Recognition of Sound Events: Approaches and Applications. In: Tsihrintzis, G.A., Jain, L.C. (eds) Multimedia Services in Intelligent Environments. Studies in Computational Intelligence, vol 120. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78502-6_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78502-6_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78491-3

  • Online ISBN: 978-3-540-78502-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics