Skip to main content

Applications in Intelligent Sound Analysis

  • Chapter
  • First Online:
Intelligent Audio Analysis

Part of the book series: Signals and Communication Technology ((SCT))

  • 2218 Accesses

Abstract

Apart from speech and music, general sound can also carry relevant information. This is, however, a considerably less researched field up to-date. Most prominent in this area are the tasks of acoustic event detection and classification that can be subsumed under the area of computational auditory scene analysis. Fields of application include media retrieval including affective content analysis or human-machine and human-robot interaction, animal vocalisation recognition, and monitoring of industrial processes. Here, three applications in real-life Intelligent Sound Analysis are given from the work of the author: audio-based animal recognition, acoustic event classification, and prediction of emotion as induced in sound listeners. In particular, weakly supervised learning techniques are presented to cope with the typical label-sparseness in this field.

If you develop an ear for sounds that are musical it is like developing an ego. You begin to refuse sounds that are not musical and that way cut yourself off from a good deal of experience.

—John Cage.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.tierstimmenarchiv.de, accessed mid 2010.

  2. 2.

    http://www.wikipedia.org

  3. 3.

    Available at http://www.openaudio.eu

References

  1. Temko, A., Nadeu, C., Macho, D., Malkin, R., Zieger, C., Omologo, M.: Acoustic event detection and classification. In: Waibel, A., Stiefelhagen, R. (eds.) Computers in the Human Interaction Loop, pp. 61–73. Springer, London (2009)

    Google Scholar 

  2. Wang, D., Brown, G.: Computational auditory scene analysis: Principles, algorithms, and applications. IEEE Press (2006)

    Google Scholar 

  3. Huang, Q., Cox, S.: Using high-level information to detect key audio events in a tennis game. In: Proceedings INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, pp. 1409–1412. Makuhari, Japan, Sept 2010. ISCA

    Google Scholar 

  4. Xu, M., Chia, L., Jin, J.: Affective content analysis in comedy and horror videos by audio emotional event detection. In: Proceedings 6th IEEE International Conference on Multimedia and Expo, ICME 2005, p. 4. Amsterdam, The Netherlands, IEEE, July 2005

    Google Scholar 

  5. Okuno, H., Ogata, T., Komatani, K., Nakadai, K.: Computational auditory scene analysis and its application to robot audition. In: Proceedings of the International Conference on Informatics Research for Development of Knowledge Society Infrastructure, pp. 73–80. IEEE (2004)

    Google Scholar 

  6. Gunasekaran, S., Revathy, K.: Content-based classification and retrieval of wild animal sounds using feature selection algorithm. In: Proceedings of International Conference on Machine Learning and Computing (ICMLC), pp. 272–275. IEEE Computer Society, Bangalore, India, Feb 2010

    Google Scholar 

  7. Wan, C., Mita, A.: An automatic pipeline monitoring system based on PCA and SVM. World Acad. Sci. Eng. Technol. 45, 90–96 (2008)

    Google Scholar 

  8. Bach, J., Anemuller, J.: 11th Annual Conference of the International Speech Communication Association, pp. 2206–2209. ISCA, Makuhari, Japan, Sept 2010

    Google Scholar 

  9. Geiger, J.T., Lakhal, M.A., Schuller, B., Rigoll, G.: Learning new acoustic events in an hmm-based system using map adaptation. In: Proceedings INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, pp. 293–296. ISCA, Florence, Italy, Aug 2011

    Google Scholar 

  10. Weninger, F., Schuller, B.: Audio recognition in the wild: Static and dynamic classification on a real-world database of animal vocalizations. In: Proceedings of 36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, pp. 337–340. IEEE, Prague, Czech Republic, May 2011

    Google Scholar 

  11. Zhang, Z., Schuller, B.: Semi-supervised learning helps in sound event classification. In: Proceedings of 37th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012, pp. 333–336. IEEE, Kyoto, Japan, March 2012

    Google Scholar 

  12. Schuller, B., Hantke, S., Weninger, F., Han, W., Zhang, Z., Narayanan, S.: Automatic recognition of emotion evoked by general sound events. In: Proceedings of 37th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012, pp. 341–344. IEEE, Kyoto, Japan, March 2012

    Google Scholar 

  13. Mellinger, D.K., Clark, C.W.: Recognizing transient low-frequency whale sounds by spectrogram correlation. J. Acoust. Soc. Am. 107(6), 3518–3529 (2000)

    Article  Google Scholar 

  14. Härmä, A.: Automatic recognition of bird species based on sinusoidal modeling of syllables. In: Proceedings of ICASSP, vol. 5, pp. 545–548. Hong Kong, April 2003

    Google Scholar 

  15. Bardeli, R.: Similarity search in animal sound databases. IEEE Trans. Multimedia 11(1), 68–76 (2009)

    Article  Google Scholar 

  16. Frommolt, K.-H., Bardeli, R., Kurth, F., Clausen, M.: The animal sound archive at the Humboldt-University of Berlin: current activities in conservation and improving access for bioacoustic research. Adv. Bioacoustics 2, 139–144 (2006)

    Google Scholar 

  17. Guo, G., Li, S.Z.: Content-based audio classification and retrieval by support vector machines. IEEE Trans. Neural Networks 14(1), 209–215 (2003)

    Article  Google Scholar 

  18. Mitrovic, D., Zeppelzauer, M., Breiteneder, C.: Discrimination and retrieval of animal sounds. In: Proceedings of Multi-Media Modelling Conference, IEEE, Beijing, China, Jan 2006

    Google Scholar 

  19. Kim, H.-G., Burred, J.J., Sikora, T.: How efficient is MPEG-7 for general sound recognition? In: Proceedings of AES 25th International Conference, London, UK, June 2004

    Google Scholar 

  20. Schuller, B., Batliner, A., Steidl, S., Seppi, D.: Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge. Speech Commun. 53(9/10):1062–1087 (2011) (Special Issue Sensing Emotion and Affect-Facing Realism in Speech Processing)

    Google Scholar 

  21. Eyben, F., Wöllmer, M., Schuller, B.: Opensmile—the munich versatile and fast open-source audio feature extractor. In: Proceedings of the 9th ACM International Conference on Multimedia, MM 2010, pp. 1459–1462. ACM, Florence, Italy, October 2010

    Google Scholar 

  22. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: An update. SIGKDD Explor. 11(1), 10–18 (2009)

    Google Scholar 

  23. Temko, A., Malkin, R., Zieger, C., Macho, D., Nadeu, C.: Acoustic event detection and classification in smart-room environments: Evaluation of chil project systems. In: Proceedings of the IV Biennial Workshop on Speech Technology, pp. 1–6. Zaragoza, Spain (2006)

    Google Scholar 

  24. Clavel, C., Ehrette, T., Richard, G.: Events detection for an audio-based surveillance system. In: Proceedings of ICME, pp. 1306–1309. Amsterdam (2005)

    Google Scholar 

  25. Ferguson, B.G., Lo, K.W.: Acoustic cueing for surveillance and security applications. In: Proceedings of SPIE, Orlando, FL, USA (2006)

    Google Scholar 

  26. Kraft, F., Malkin, R., Schaaf, T., Waibel, A.: Temporal ICA for classification of acoustic events in a kitchen environment. In: Proceedings of INTERSPEECH, pp. 2689–2692. Lisbon, Portugal (2005)

    Google Scholar 

  27. Temko, A., Nadeu, C.: Classification of acoustic events using SVM-based clustering schemes. Pattern Recogn. 39, 682–694 (2006)

    Article  MATH  Google Scholar 

  28. Zieger, C., Omologo, M.: Acoustic event classification using a distributed microphone network with a GMM/SVM combined algorithm. In: Proceedings of INTERSPEECH, pp. 115–118. Brisbane, Australia (2008)

    Google Scholar 

  29. Heittola, T., Klapuri, A.: TUT acoustic event detection system 2007. In: Proceedings of Multimodal Technologies for Perception of Humans: International Evaluation Workshops CLEAR 2007 and RT 2007, pp. 364–370. Springer, Berlin, Heidelberg (2008)

    Google Scholar 

  30. Ntalampiras, S., Potamitis, I., Fakotakis, N.: Automatic recognition of urban environmental sound events. In: Proceedings of CIP2008, Eurasip, pp. 110–113 (2008)

    Google Scholar 

  31. Peng, Y., Lin, C., Sun, M., Tsai, K.: Healthcare audio event classification using hidden markov models and hierarchical hidden markov models. In: Proceedings of ICME, pp. 1218–1221. Piscataway, NJ, USA (2009)

    Google Scholar 

  32. Dat, T.H., Li, H.: Probabilistic distance svm with hellinger-exponential kernel for sound event classification. In: Proceedings of ICASSP, pp. 2272–2275. Prague, Czech Republic (2011)

    Google Scholar 

  33. Chu, S., Narayanan, S., Kuo, C.-C.J.: Environmental sound recognition with time-frequency audio features. Trans. Audio Speech Lang. Process. 17(6), 1142–1158 (2009)

    Google Scholar 

  34. Mesaros, A., Heittola, T., Eronen, A., Virtanen, T.: Acoustic event detection in real life recordings. In: Proceedings of EUSIPCO, Aalborg, Denmark (2010)

    Google Scholar 

  35. Hakkani-Tur, D., Tur, G., Rahim, M., Riccardi, G.: Unsupervised and active learning in automatic speech recognition for call classification. In: Proceedings of ICASSP, pp. 429–432. Montreal, Canada, (2004)

    Google Scholar 

  36. Tur, G., Stolcke, A.: Unsupervised language model adaptation for meeting recognition. In: Proceedings of ICASSP, pp.173–176. Honolulu, Hawaii, USA (2007)

    Google Scholar 

  37. Zhang, Z., Weninger, F., Wöllmer, M., Schuller, B.: Unsupervised learning in cross-corpus acoustic emotion recognition. In: Proceedings of 12th Biannual IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2011, pp. 523–528. IEEE, Big Island, HY, Dec 2011

    Google Scholar 

  38. Gunes, H., Schuller, B., Pantic, M., Cowie, R.: Emotion representation, analysis and synthesis in continuous space: a survey. In: Proceedings of the International Workshop on Emotion Synthesis, representation, and Analysis in Continuous spacE, EmoSPACE 2011, held in Conjunction with the 9th IEEE International Conference on Automatic Face & Gesture Recognition and Workshops, FG 2011, pp. 827–834. IEEE, Santa Barbara, CA, March 2011

    Google Scholar 

  39. Kim, Y., Schmidt, E., Migneco, R., Morton, B., Richardson, P., Scott, J., Speck, J., Turnbull, D.: Music emotion recognition: a state of the art review. In: Proceedings of ISMIR, pp. 255–266. Utrecht, The Netherlands (2010)

    Google Scholar 

  40. Forrester, M.: Auditory perception and sound as event: theorising sound imagery in psychology. J. Sound, http://www.kent.ac.uk/arts/sound-journal/forrester001.html (2000)

  41. Sundaram, S., Schleicher, R.: Towards evaluation of example-based audio retrieval system using affective dimensions. In: Proceedings of ICME, pp. 573–577. Singapore, Singapore (2010)

    Google Scholar 

  42. Gygi, B., Shafiro, V.: Development of the database for environmental sound research and application (DESRA): Design, functionality, and retrieval considerations. EURASIP J. Audio Speech Music Process. pp. 12 (2010). Article ID: 654914

    Google Scholar 

  43. Schuller, B., Dorfner, J., Rigoll, G.: Determination of non-prototypical valence and arousal in popular music: Features and performances. EURASIP J. Audio Speech Music Process. (Special Issue on Scalable Audio-Content Analysis, 2010) pp. 19 (2010). (Article ID 735854)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Björn Schuller .

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Schuller, B. (2013). Applications in Intelligent Sound Analysis. In: Intelligent Audio Analysis. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36806-6_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36806-6_12

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36805-9

  • Online ISBN: 978-3-642-36806-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics