Skip to main content

High-Level Video Semantic Concept Detection Based on Multi-level Feature Representations

  • Conference paper
Advances in Multimedia Information Processing – PCM 2013 (PCM 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8294))

Included in the following conference series:

Abstract

Semantic concept detection is a fundamental problem with many practical applications such as concept-based video retrieval. The major challenge of concept detection lies in the existence of the well-known semantic gap between the low-level visual features and the user’s semantic interpretation of visual data. To bridge the semantic gap, we propose to promote the low-level visual feature to the middle-level representation, expecting that the underlying semantic aspects of image data can be discovered, and such latent aspects can better model the semantic of images. Specifically, latent Dirichlet allocation (LDA) is adopted to cluster the image data into semantic topics and the distributions on such topics are used as the middle-level feature vectors of the image. Meanwhile, a recently developed more efficient probabilistic representation of low-level features, i.e., Fisher Vector (FV) is used to complement the LDA representation for video concept detection. The experimental results on the TRECVID 2013 Semantic Indexing dataset have demonstrated the effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., Jain, R.: Content Based Image Retrieval at The End of The Early Years. J. IEEE Trans. PAMI 22, 1349–1380 (2000)

    Google Scholar 

  2. Tang, J., Yan, S., Hong, R., Qi, G.J., Chua, T.S.: Inferring Semantic Concepts from Community-Contributed Images and Noisy Tags. In: Proc. ACM Conf. Multimedia, pp. 223–232 (2009)

    Google Scholar 

  3. Snoek, C.G.M., Worring, M., Gemert, J.C., Geusebroek, J.M., Smeulders, A.W.: The Challenge Problem for Automated Detection of 101 Semantic Concepts in Multimedia. In: Proc. ACM Conf. Multimedia, pp. 421–430 (2006)

    Google Scholar 

  4. Wang, M., Hong, R., Li, G., Zha, Z.J., Yan, S., Chua, T.S.: Event Driven Web Video Summarization by Tag Localization and Key-Shot Identification. J. IEEE Trans. on Multimedia 14, 975–985 (2012)

    Google Scholar 

  5. Tang, S., Zheng, Y.T., Wang, Y., Chua, T.S.: Sparse Ensemble Learning for Concept Detection. J. IEEE Trans. on Multimedia 14, 43–54 (2012)

    Google Scholar 

  6. Lew, M.S., Sebe, N., Dheraba, C.: Content-Based Multimedia Information Retrieval: State of the Art and Challenges. J. TOMCCAP 2, 1–19 (2006)

    Google Scholar 

  7. Sivic, J., Zisserman, A.: Video Google: A Text Retrieval Approach to Object Matching in Videos. In: ICCV, vol. 2, pp. 1470–1477 (2003)

    Google Scholar 

  8. Csurka, G., Dance, C.R., Fan, L., Willamowski, J., Bray, C.: Visual Categorization with Bags of Keypoints. In: ECCV Workshop, pp. 1–22 (2004)

    Google Scholar 

  9. Quelhas, P., Monay, F., Odobez, J.M.: Modeling Scenes with Local Descriptors and Latent Aspects. In: ICCV, vol. 1, pp. 883–890 (2005)

    Google Scholar 

  10. Fei-Fei, L., Perona, P.: A Bayesian Hierarchical Model for Learning Natural Scene Categories. In: Proc. CVPR, vol. 2, pp. 524–531 (2005)

    Google Scholar 

  11. Farquhar, J., Szedmak, S., Meng, H., Shawe-Taylor, J.: Improving “Bag-of-Keypoints” Image Categorisation. J. Technical report, University of Southampton (2005)

    Google Scholar 

  12. Perronnin, F., Dance, C.R., Csurka, G., Bressan, M.: Adapted vocabularies for generic visual categorization. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 464–475. Springer, Heidelberg (2006)

    Google Scholar 

  13. Jiang, Y.G., Ngo, C.W., Yang, J.: Towards Optimal Bag-of-Features for Object Categorization and Semantic Video Retrieval. In: CIVR 2007, pp. 494–501 (2007)

    Google Scholar 

  14. Bosch, A., Zisserman, A., Munoz, X.: Image Classifcation Using Random Forests and Ferns. In: Proc. ICCV, pp. 1–8 (2007)

    Google Scholar 

  15. Li, H., Wang, X., Tang, J., Zhao, C.: Combining Global and Local Matching of Multiple Features for Precise Retrieval of Item Images. J. ACM/Springer Multimedia System Journal 19, 37–49 (2013)

    Google Scholar 

  16. Jaakkola, T.S., Haussler, D.: Exploiting Generative Models in Discriminative Classifiers. Advances in Neural Information Processing Systems 11, 487–493 (1999)

    Google Scholar 

  17. Jaakkola, T.S., Haussler, D.: Probabilistic kernel regression models. In: Proceedings of the 1999 Conference on AI and Statistics (1999)

    Google Scholar 

  18. Jegou, H., Douze, M., Schmid, C., Perez, P.: Aggregating local descriptors into a compact image representation. In: CVPR, pp. 3304–3311 (2010)

    Google Scholar 

  19. Perronnin, F., Dance, C.: Fisher Kernels on Visual Vocabularies for Image Categorization. In: CVPR 2007, pp. 1–8 (2007)

    Google Scholar 

  20. Sun, C., Nevatia, R.: Large-scale Web Video Event Classification by use of Fisher Vectors. In: WACV, pp. 15–22 (2013)

    Google Scholar 

  21. Csurka, G., Perronnin, F.: Fisher Vectors: Beyond Bag-of-Visual-Words Image Representations. In: Richard, P., Braz, J. (eds.) VISIGRAPP 2010. CCIS, vol. 229, pp. 28–42. Springer, Heidelberg (2011)

    Google Scholar 

  22. Vailaya, A., Figueiredo, M.A.T., Jain, A.K., Zhang, H.J.: Image Classification for Content-based Indexing. J. IEEE Transactions on Image Processing 10, 117–130 (2001)

    Google Scholar 

  23. Naphade, M.R.: A Probabilistic Framework for Mapping Audio-visual Features to High-Level Semantics in Terms of Concepts and Context. Dissertation of the University of Illinois at Urbana-Champaign (2001)

    Google Scholar 

  24. Hinton, G.E., Krizhevsky, A., Sutskever, L.: ImageNet Classification with Deep Convolutional Neural Networks. J. NIPS, 1106–1114 (2012)

    Google Scholar 

  25. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. J. JMLR, 993–1022 (2003)

    Google Scholar 

  26. Lienou, M., Maitre, H., Datcu, M.: Semantic Annotation of Satellite Images Using Latent Dirichlet Allocation. J. IEEE 7, 28–32 (2010)

    Google Scholar 

  27. Snoek, C., Worring, M., Smeulders, A.: Early Versus Late Fusion in Semantic Video Analysis. In: Proc. ACM Int. Conf. Multimedia, Singapore, pp. 399–402 (2005)

    Google Scholar 

  28. Tang, S., Li, J.T., Li, M., Xie, C., Liu, Y.Z.: TRECVID 2008 High-Level Feature Extraction By MCG-ICT-CAS. In: Proc. TRECVID 2008 Workshop, Gaithesburg, USA (2008)

    Google Scholar 

  29. TREC Video Retrieval Evaluation, http://trecvid.nist.gov/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Liu, L., Li, H., Sun, F., Yin, Y., Liu, C. (2013). High-Level Video Semantic Concept Detection Based on Multi-level Feature Representations. In: Huet, B., Ngo, CW., Tang, J., Zhou, ZH., Hauptmann, A.G., Yan, S. (eds) Advances in Multimedia Information Processing – PCM 2013. PCM 2013. Lecture Notes in Computer Science, vol 8294. Springer, Cham. https://doi.org/10.1007/978-3-319-03731-8_61

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-03731-8_61

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-03730-1

  • Online ISBN: 978-3-319-03731-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics