Abstract
Semantic concept detection is a fundamental problem with many practical applications such as concept-based video retrieval. The major challenge of concept detection lies in the existence of the well-known semantic gap between the low-level visual features and the user’s semantic interpretation of visual data. To bridge the semantic gap, we propose to promote the low-level visual feature to the middle-level representation, expecting that the underlying semantic aspects of image data can be discovered, and such latent aspects can better model the semantic of images. Specifically, latent Dirichlet allocation (LDA) is adopted to cluster the image data into semantic topics and the distributions on such topics are used as the middle-level feature vectors of the image. Meanwhile, a recently developed more efficient probabilistic representation of low-level features, i.e., Fisher Vector (FV) is used to complement the LDA representation for video concept detection. The experimental results on the TRECVID 2013 Semantic Indexing dataset have demonstrated the effectiveness of the proposed approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., Jain, R.: Content Based Image Retrieval at The End of The Early Years. J. IEEE Trans. PAMI 22, 1349–1380 (2000)
Tang, J., Yan, S., Hong, R., Qi, G.J., Chua, T.S.: Inferring Semantic Concepts from Community-Contributed Images and Noisy Tags. In: Proc. ACM Conf. Multimedia, pp. 223–232 (2009)
Snoek, C.G.M., Worring, M., Gemert, J.C., Geusebroek, J.M., Smeulders, A.W.: The Challenge Problem for Automated Detection of 101 Semantic Concepts in Multimedia. In: Proc. ACM Conf. Multimedia, pp. 421–430 (2006)
Wang, M., Hong, R., Li, G., Zha, Z.J., Yan, S., Chua, T.S.: Event Driven Web Video Summarization by Tag Localization and Key-Shot Identification. J. IEEE Trans. on Multimedia 14, 975–985 (2012)
Tang, S., Zheng, Y.T., Wang, Y., Chua, T.S.: Sparse Ensemble Learning for Concept Detection. J. IEEE Trans. on Multimedia 14, 43–54 (2012)
Lew, M.S., Sebe, N., Dheraba, C.: Content-Based Multimedia Information Retrieval: State of the Art and Challenges. J. TOMCCAP 2, 1–19 (2006)
Sivic, J., Zisserman, A.: Video Google: A Text Retrieval Approach to Object Matching in Videos. In: ICCV, vol. 2, pp. 1470–1477 (2003)
Csurka, G., Dance, C.R., Fan, L., Willamowski, J., Bray, C.: Visual Categorization with Bags of Keypoints. In: ECCV Workshop, pp. 1–22 (2004)
Quelhas, P., Monay, F., Odobez, J.M.: Modeling Scenes with Local Descriptors and Latent Aspects. In: ICCV, vol. 1, pp. 883–890 (2005)
Fei-Fei, L., Perona, P.: A Bayesian Hierarchical Model for Learning Natural Scene Categories. In: Proc. CVPR, vol. 2, pp. 524–531 (2005)
Farquhar, J., Szedmak, S., Meng, H., Shawe-Taylor, J.: Improving “Bag-of-Keypoints” Image Categorisation. J. Technical report, University of Southampton (2005)
Perronnin, F., Dance, C.R., Csurka, G., Bressan, M.: Adapted vocabularies for generic visual categorization. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 464–475. Springer, Heidelberg (2006)
Jiang, Y.G., Ngo, C.W., Yang, J.: Towards Optimal Bag-of-Features for Object Categorization and Semantic Video Retrieval. In: CIVR 2007, pp. 494–501 (2007)
Bosch, A., Zisserman, A., Munoz, X.: Image Classifcation Using Random Forests and Ferns. In: Proc. ICCV, pp. 1–8 (2007)
Li, H., Wang, X., Tang, J., Zhao, C.: Combining Global and Local Matching of Multiple Features for Precise Retrieval of Item Images. J. ACM/Springer Multimedia System Journal 19, 37–49 (2013)
Jaakkola, T.S., Haussler, D.: Exploiting Generative Models in Discriminative Classifiers. Advances in Neural Information Processing Systems 11, 487–493 (1999)
Jaakkola, T.S., Haussler, D.: Probabilistic kernel regression models. In: Proceedings of the 1999 Conference on AI and Statistics (1999)
Jegou, H., Douze, M., Schmid, C., Perez, P.: Aggregating local descriptors into a compact image representation. In: CVPR, pp. 3304–3311 (2010)
Perronnin, F., Dance, C.: Fisher Kernels on Visual Vocabularies for Image Categorization. In: CVPR 2007, pp. 1–8 (2007)
Sun, C., Nevatia, R.: Large-scale Web Video Event Classification by use of Fisher Vectors. In: WACV, pp. 15–22 (2013)
Csurka, G., Perronnin, F.: Fisher Vectors: Beyond Bag-of-Visual-Words Image Representations. In: Richard, P., Braz, J. (eds.) VISIGRAPP 2010. CCIS, vol. 229, pp. 28–42. Springer, Heidelberg (2011)
Vailaya, A., Figueiredo, M.A.T., Jain, A.K., Zhang, H.J.: Image Classification for Content-based Indexing. J. IEEE Transactions on Image Processing 10, 117–130 (2001)
Naphade, M.R.: A Probabilistic Framework for Mapping Audio-visual Features to High-Level Semantics in Terms of Concepts and Context. Dissertation of the University of Illinois at Urbana-Champaign (2001)
Hinton, G.E., Krizhevsky, A., Sutskever, L.: ImageNet Classification with Deep Convolutional Neural Networks. J. NIPS, 1106–1114 (2012)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. J. JMLR, 993–1022 (2003)
Lienou, M., Maitre, H., Datcu, M.: Semantic Annotation of Satellite Images Using Latent Dirichlet Allocation. J. IEEE 7, 28–32 (2010)
Snoek, C., Worring, M., Smeulders, A.: Early Versus Late Fusion in Semantic Video Analysis. In: Proc. ACM Int. Conf. Multimedia, Singapore, pp. 399–402 (2005)
Tang, S., Li, J.T., Li, M., Xie, C., Liu, Y.Z.: TRECVID 2008 High-Level Feature Extraction By MCG-ICT-CAS. In: Proc. TRECVID 2008 Workshop, Gaithesburg, USA (2008)
TREC Video Retrieval Evaluation, http://trecvid.nist.gov/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Liu, L., Li, H., Sun, F., Yin, Y., Liu, C. (2013). High-Level Video Semantic Concept Detection Based on Multi-level Feature Representations. In: Huet, B., Ngo, CW., Tang, J., Zhou, ZH., Hauptmann, A.G., Yan, S. (eds) Advances in Multimedia Information Processing – PCM 2013. PCM 2013. Lecture Notes in Computer Science, vol 8294. Springer, Cham. https://doi.org/10.1007/978-3-319-03731-8_61
Download citation
DOI: https://doi.org/10.1007/978-3-319-03731-8_61
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03730-1
Online ISBN: 978-3-319-03731-8
eBook Packages: Computer ScienceComputer Science (R0)