Auditory Context Recognition Combining Discriminative and Generative Models

Su, Feng; Yang, Li

doi:10.1007/978-3-319-03731-8_56

Feng Su²² &
Li Yang²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8294))

Included in the following conference series:

Pacific-Rim Conference on Multimedia

2903 Accesses

Abstract

The paper considers the task of recognizing the category of a context surrounding an audio sensor. Due to the unstructured and diverse nature of the auditory context and constituent environmental sounds, which differs from the usual structured audio data like speech or music, the recognition of auditory context faces many difficulties and relatively fewer researchs have addressed it. In this paper, we propose an ensemble recognition scheme based on the Hough forest framework for unstructured auditory contexts, which combines the discriminative and generative modeling of the context. We learn the effective audio feature representation for environmental sounds in the context with the LDB algorithm, and recognize the context using the Hough forest based ensemble classifier, which aggregates both the segmental and the contextual probabilistic votes on the context category by the segments of the auditory context. The experimental results demonstrate the effectiveness of the proposed approach for auditory context recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chu, S., Narayanan, S., Kuo, C.C.J.: Environmental sound recognition with time-frequency audio features. IEEE TASLP 17(6), 1142–1158 (2009)
Google Scholar
Kiranyaz, S., Qureshi, A.F., Gabbouj, M.: A generic audio classification and segmentation approach for multimedia indexing and retrieval. IEEE TASLP 14(3), 1062–1081 (2006)
Google Scholar
Lin, C., Chen, S., Truong, T., Chang, Y.: Audio classification and categorization based on wavelets and support vector machine. IEEE T. Speech and Audio Processing 13(5), 644–651 (2005)
Article Google Scholar
Umapathy, K., Krishnan, S., Jimaa, S.: Multigroup classification of audio signals using time-frequency parameters. IEEE T. Multimedia 7(2), 308–315 (2005)
Article Google Scholar
Umapathy, K., Krishnan, S., Rao, R.K.: Audio signal feature extraction and classification using local discriminant bases. IEEE TASLP 15(4), 1236–1246 (2007)
Google Scholar
Han, B., Hwang, E.: Environmental sound classification based on feature collaboration. In: ICME 2009, pp. 542–545 (2009)
Google Scholar
Wang, J., Wang, J., He, K., Hsu, C.: Environmental sound classification using hybrid SVM/KNN classifier and mpeg-7 audio low-level descriptor. In: IJCNN 2006, pp. 1731–1735 (2006)
Google Scholar
Cai, R., Lu, L., Hanjalic, A., Zhang, H., Cai, L.: A flexible framework for key audio effects detection and auditory context inference. IEEE TASLP 14(3), 1026–1039 (2006)
Google Scholar
Chu, W., Cheng, W., Wu, J.: Generative and discriminative modeling toward semantic context detection in audio tracks. In: MMM 2005, pp. 38–45 (2005)
Google Scholar
Eronen, A.J., Peltonen, V.T., Tuomi, J.T., Klapuri, A.P., Fagerlund, S., Sorsa, T., Lorho, G., Huopaniemi, J.: Audio-based context recognition. IEEE TASLP 14(1), 321–329 (2006)
Google Scholar
Su, F., Yang, L., Lu, T., Wang, G.: Environmental sound classification for scene recognition using local discriminant bases and HMM. In: ACM Multimedia 2011, pp. 1389–1392 (2011)
Google Scholar
Saito, N., Coifman, R.R.: Local discriminant bases and their applications. J. of Mathematical Imaging and Vision 5(4), 337–358 (1995)
Article MathSciNet MATH Google Scholar
Gall, J., Lempitsky, V.: Class-specific hough forests for object detection. In: CVPR 2009, pp. 1022–1029 (2009)
Google Scholar
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Article MATH Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research 3(4-5), 993–1022 (2003)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China
Feng Su & Li Yang

Authors

Feng Su
View author publications
You can also search for this author in PubMed Google Scholar
Li Yang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

EURECOM, Multimedia Department, Sophia Antipolis, France
Benoit Huet
Department of Computer Science, City University of Hong Kong, Tat Chee Ave, Kowloon, Hong Kong
Chong-Wah Ngo
Nanjing University of Science and Technology, 210093, Nanjing, China
Jinhui Tang
Department of Computer Science and Technology, Nanjing University, Xianlin Avenue No. 163, 210023, Nanjing, China
Zhi-Hua Zhou
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
Alexander G. Hauptmann
Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, 117583, Singapore, Singapore
Shuicheng Yan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Su, F., Yang, L. (2013). Auditory Context Recognition Combining Discriminative and Generative Models. In: Huet, B., Ngo, CW., Tang, J., Zhou, ZH., Hauptmann, A.G., Yan, S. (eds) Advances in Multimedia Information Processing – PCM 2013. PCM 2013. Lecture Notes in Computer Science, vol 8294. Springer, Cham. https://doi.org/10.1007/978-3-319-03731-8_56

Download citation

DOI: https://doi.org/10.1007/978-3-319-03731-8_56
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03730-1
Online ISBN: 978-3-319-03731-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics