High-Level Video Semantic Concept Detection Based on Multi-level Feature Representations

Liu, Lijuan; Li, Haojie; Sun, Fuming; Yin, Yaomin; Liu, Chenxin

doi:10.1007/978-3-319-03731-8_61

Lijuan Liu²²,
Haojie Li²²,
Fuming Sun²³,
Yaomin Yin²⁴ &
…
Chenxin Liu²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8294))

Included in the following conference series:

Pacific-Rim Conference on Multimedia

2912 Accesses
1 Citations

Abstract

Semantic concept detection is a fundamental problem with many practical applications such as concept-based video retrieval. The major challenge of concept detection lies in the existence of the well-known semantic gap between the low-level visual features and the user’s semantic interpretation of visual data. To bridge the semantic gap, we propose to promote the low-level visual feature to the middle-level representation, expecting that the underlying semantic aspects of image data can be discovered, and such latent aspects can better model the semantic of images. Specifically, latent Dirichlet allocation (LDA) is adopted to cluster the image data into semantic topics and the distributions on such topics are used as the middle-level feature vectors of the image. Meanwhile, a recently developed more efficient probabilistic representation of low-level features, i.e., Fisher Vector (FV) is used to complement the LDA representation for video concept detection. The experimental results on the TRECVID 2013 Semantic Indexing dataset have demonstrated the effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., Jain, R.: Content Based Image Retrieval at The End of The Early Years. J. IEEE Trans. PAMI 22, 1349–1380 (2000)
Google Scholar
Tang, J., Yan, S., Hong, R., Qi, G.J., Chua, T.S.: Inferring Semantic Concepts from Community-Contributed Images and Noisy Tags. In: Proc. ACM Conf. Multimedia, pp. 223–232 (2009)
Google Scholar
Snoek, C.G.M., Worring, M., Gemert, J.C., Geusebroek, J.M., Smeulders, A.W.: The Challenge Problem for Automated Detection of 101 Semantic Concepts in Multimedia. In: Proc. ACM Conf. Multimedia, pp. 421–430 (2006)
Google Scholar
Wang, M., Hong, R., Li, G., Zha, Z.J., Yan, S., Chua, T.S.: Event Driven Web Video Summarization by Tag Localization and Key-Shot Identification. J. IEEE Trans. on Multimedia 14, 975–985 (2012)
Google Scholar
Tang, S., Zheng, Y.T., Wang, Y., Chua, T.S.: Sparse Ensemble Learning for Concept Detection. J. IEEE Trans. on Multimedia 14, 43–54 (2012)
Google Scholar
Lew, M.S., Sebe, N., Dheraba, C.: Content-Based Multimedia Information Retrieval: State of the Art and Challenges. J. TOMCCAP 2, 1–19 (2006)
Google Scholar
Sivic, J., Zisserman, A.: Video Google: A Text Retrieval Approach to Object Matching in Videos. In: ICCV, vol. 2, pp. 1470–1477 (2003)
Google Scholar
Csurka, G., Dance, C.R., Fan, L., Willamowski, J., Bray, C.: Visual Categorization with Bags of Keypoints. In: ECCV Workshop, pp. 1–22 (2004)
Google Scholar
Quelhas, P., Monay, F., Odobez, J.M.: Modeling Scenes with Local Descriptors and Latent Aspects. In: ICCV, vol. 1, pp. 883–890 (2005)
Google Scholar
Fei-Fei, L., Perona, P.: A Bayesian Hierarchical Model for Learning Natural Scene Categories. In: Proc. CVPR, vol. 2, pp. 524–531 (2005)
Google Scholar
Farquhar, J., Szedmak, S., Meng, H., Shawe-Taylor, J.: Improving “Bag-of-Keypoints” Image Categorisation. J. Technical report, University of Southampton (2005)
Google Scholar
Perronnin, F., Dance, C.R., Csurka, G., Bressan, M.: Adapted vocabularies for generic visual categorization. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 464–475. Springer, Heidelberg (2006)
Google Scholar
Jiang, Y.G., Ngo, C.W., Yang, J.: Towards Optimal Bag-of-Features for Object Categorization and Semantic Video Retrieval. In: CIVR 2007, pp. 494–501 (2007)
Google Scholar
Bosch, A., Zisserman, A., Munoz, X.: Image Classifcation Using Random Forests and Ferns. In: Proc. ICCV, pp. 1–8 (2007)
Google Scholar
Li, H., Wang, X., Tang, J., Zhao, C.: Combining Global and Local Matching of Multiple Features for Precise Retrieval of Item Images. J. ACM/Springer Multimedia System Journal 19, 37–49 (2013)
Google Scholar
Jaakkola, T.S., Haussler, D.: Exploiting Generative Models in Discriminative Classifiers. Advances in Neural Information Processing Systems 11, 487–493 (1999)
Google Scholar
Jaakkola, T.S., Haussler, D.: Probabilistic kernel regression models. In: Proceedings of the 1999 Conference on AI and Statistics (1999)
Google Scholar
Jegou, H., Douze, M., Schmid, C., Perez, P.: Aggregating local descriptors into a compact image representation. In: CVPR, pp. 3304–3311 (2010)
Google Scholar
Perronnin, F., Dance, C.: Fisher Kernels on Visual Vocabularies for Image Categorization. In: CVPR 2007, pp. 1–8 (2007)
Google Scholar
Sun, C., Nevatia, R.: Large-scale Web Video Event Classification by use of Fisher Vectors. In: WACV, pp. 15–22 (2013)
Google Scholar
Csurka, G., Perronnin, F.: Fisher Vectors: Beyond Bag-of-Visual-Words Image Representations. In: Richard, P., Braz, J. (eds.) VISIGRAPP 2010. CCIS, vol. 229, pp. 28–42. Springer, Heidelberg (2011)
Google Scholar
Vailaya, A., Figueiredo, M.A.T., Jain, A.K., Zhang, H.J.: Image Classification for Content-based Indexing. J. IEEE Transactions on Image Processing 10, 117–130 (2001)
Google Scholar
Naphade, M.R.: A Probabilistic Framework for Mapping Audio-visual Features to High-Level Semantics in Terms of Concepts and Context. Dissertation of the University of Illinois at Urbana-Champaign (2001)
Google Scholar
Hinton, G.E., Krizhevsky, A., Sutskever, L.: ImageNet Classification with Deep Convolutional Neural Networks. J. NIPS, 1106–1114 (2012)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. J. JMLR, 993–1022 (2003)
Google Scholar
Lienou, M., Maitre, H., Datcu, M.: Semantic Annotation of Satellite Images Using Latent Dirichlet Allocation. J. IEEE 7, 28–32 (2010)
Google Scholar
Snoek, C., Worring, M., Smeulders, A.: Early Versus Late Fusion in Semantic Video Analysis. In: Proc. ACM Int. Conf. Multimedia, Singapore, pp. 399–402 (2005)
Google Scholar
Tang, S., Li, J.T., Li, M., Xie, C., Liu, Y.Z.: TRECVID 2008 High-Level Feature Extraction By MCG-ICT-CAS. In: Proc. TRECVID 2008 Workshop, Gaithesburg, USA (2008)
Google Scholar
TREC Video Retrieval Evaluation, http://trecvid.nist.gov/

Download references

Author information

Authors and Affiliations

Dalian University of Technology, China
Lijuan Liu, Haojie Li & Chenxin Liu
Liaoning University of Technology, China
Fuming Sun
Agricultural Bank of China, China
Yaomin Yin

Authors

Lijuan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Haojie Li
View author publications
You can also search for this author in PubMed Google Scholar
Fuming Sun
View author publications
You can also search for this author in PubMed Google Scholar
Yaomin Yin
View author publications
You can also search for this author in PubMed Google Scholar
Chenxin Liu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

EURECOM, Multimedia Department, Sophia Antipolis, France
Benoit Huet
Department of Computer Science, City University of Hong Kong, Tat Chee Ave, Kowloon, Hong Kong
Chong-Wah Ngo
Nanjing University of Science and Technology, 210093, Nanjing, China
Jinhui Tang
Department of Computer Science and Technology, Nanjing University, Xianlin Avenue No. 163, 210023, Nanjing, China
Zhi-Hua Zhou
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
Alexander G. Hauptmann
Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, 117583, Singapore, Singapore
Shuicheng Yan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, L., Li, H., Sun, F., Yin, Y., Liu, C. (2013). High-Level Video Semantic Concept Detection Based on Multi-level Feature Representations. In: Huet, B., Ngo, CW., Tang, J., Zhou, ZH., Hauptmann, A.G., Yan, S. (eds) Advances in Multimedia Information Processing – PCM 2013. PCM 2013. Lecture Notes in Computer Science, vol 8294. Springer, Cham. https://doi.org/10.1007/978-3-319-03731-8_61

Download citation

DOI: https://doi.org/10.1007/978-3-319-03731-8_61
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03730-1
Online ISBN: 978-3-319-03731-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics