Skip to main content

Deep Learning and Shared Representation Space Learning Based Cross-Modal Multimedia Retrieval

  • Conference paper
  • First Online:
Intelligent Computing Theories and Application (ICIC 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9772))

Included in the following conference series:

Abstract

An increasing number of different multimedia information, including text, voice, video and image, are used to describe the same semantic concept together on the Internet. This paper presents a new method to more efficiently cross-modal multimedia retrieval. Using image and text as an example, we learn the deep learning features of images by convolution neural networks, and learn the text features by a latent Dirichlet allocation model. Then map the two features spaces into a shared presentation space by a probability model in order that they are isomorphic. At last, we adopt centered correlation to measure the distance between them. The experimental results in the Wikipedia dataset show that our approach can achieve the state-of-the-art results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Yang, Y., Xu, D., Nie, F., Luo, J., Zhuang, Y.: Ranking with local regression and global alignment for cross media retrieval. In: International Conference on Multimedia, pp. 175–184 (2009)

    Google Scholar 

  2. Srivastava, N., Salakhutdinov, R.R.: Multimodal learning with deep Boltzmann machines. In: Neural Information Processing Systems, pp. 2222–2230 (2012)

    Google Scholar 

  3. Lu, X., Wu, F., Tang, S.: A low rank structural large margin method for cross-modal ranking. In: Research and Development in Information Retrieval, pp. 433–442 (2013)

    Google Scholar 

  4. Lu, X., Wu, F., Tang, S., Zhang, Z., He, X., Zhuang, Y.: Cross-media semantic representation via bi-directional learning to rank. In: International Conference on Multimedia, pp. 877–886 (2013)

    Google Scholar 

  5. Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: International Conference on Machine Learning, pp. 282–289 (2001)

    Google Scholar 

  6. Xu, X.S., Jiang, Y., Peng, L., Xue, X., Zhou, Z.H.: Ensemble approach based on conditional random field for multi-label image and video annotation. In: International Conference on Multimedia, pp. 1377–1380 (2011)

    Google Scholar 

  7. Zhang, Y., Li, G., Chu, L., Wang, S., Zhang, W., Huang, Q.: Cross-media topic detection: a multi-modality fusion framework. In: International Conference on IEEE, pp. 1–6 (2013)

    Google Scholar 

  8. Li, L., Jiang, S., Huang, Q.: Learning image vicept description via mixed-norm regularization for large scale semantic image search. In: Computer Vision and Pattern Recognition, pp. 825–832 (2011)

    Google Scholar 

  9. Rasiwasia, N., Costa Pereira, J., Coviello, E., Doyle, G., Lanckriet, G.R., Levy, R., Vasconcelos, N.: A new approach to cross-modal multimedia retrieval. In: International Conference on Multimedia, pp. 251–260 (2010)

    Google Scholar 

  10. Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  11. LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)

    Article  Google Scholar 

  12. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  13. Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)

    Article  Google Scholar 

  14. Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: Computer Vision and Pattern Recognition Workshops, pp. 512–519 (2014)

    Google Scholar 

  15. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  16. Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Conference on Uncertainty in Artificial Intelligence, pp. 487–494 (2004)

    Google Scholar 

  17. Ramage, D., Hall, D., Nallapati, R., Manning, C.D.: Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In: Conference on Empirical Methods in Natural Language Processing, pp. 248–256 (2009)

    Google Scholar 

  18. Liu, Y., Niculescu-Mizil, A., Gryc, W.: Topic-link LDA: joint models of topic and author community. In: Annual International Conference on Machine Learning, pp. 665–672 (2009)

    Google Scholar 

  19. Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: International Conference on Machine Learning, pp. 689–696 (2011)

    Google Scholar 

  20. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: International Conference on Multimedia, pp. 675–678 (2014)

    Google Scholar 

  21. Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: International Conference on Machine Learning, pp. 807–814 (2010)

    Google Scholar 

  22. Li, J., Luo, W., Yang, J., Yuan, X.: Why Does The Unsupervised Pretraining Encourages Moderate-Sparseness. arXiv Preprint arXiv:1312.5813 (2013)

  23. Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving Neural Networks by Preventing Co-adaptation of Feature Detectors. arXiv Preprint arXiv:1207.0580 (2012)

  24. Wang, W., Ooi, B.C., Yang, X., Zhang, D., Zhuang, Y.: Effective multi-modal retrieval based on stacked auto-encoders. Proc. VLDB Endowment 7(8), 649–660 (2014)

    Article  Google Scholar 

  25. Wu, F., Jiang, X., Li, X., Tang, S., Lu, W., Zhang, Z., Zhuang, Y.: Cross-modal learning to rank via latent joint representation. Image Process. 24(5), 1497–1509 (2015)

    Article  MathSciNet  Google Scholar 

  26. Ling, L., Zhai, X., Peng, Y.: Tri-space and ranking based heterogeneous similarity measure for cross-media retrieval. In: Pattern Recognition International Conference on IEEE, pp. 230–233 (2012)

    Google Scholar 

Download references

Acknowledgement

This work was supported by the Grant of the National Science Foundation of China (No. 61175121, 61502183), the Grant of the National Science Foundation of Fujian Province (No. 2013J06014), the Promotion Program for Young and Middle-aged Teacher in Science and Technology Research of Huaqiao University (No. ZQN-YX108), the Scientific Research Funds of Huaqiao University (No. 600005-Z15Y0016), and Subsidized Project for Cultivating Postgraduates’ Innovative Ability in Scientific Research of Huaqiao University (Nos. 1400214009, 1400214003).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ji-Xiang Du .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Zou, H., Du, JX., Zhai, CM., Wang, J. (2016). Deep Learning and Shared Representation Space Learning Based Cross-Modal Multimedia Retrieval. In: Huang, DS., Jo, KH. (eds) Intelligent Computing Theories and Application. ICIC 2016. Lecture Notes in Computer Science(), vol 9772. Springer, Cham. https://doi.org/10.1007/978-3-319-42294-7_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-42294-7_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-42293-0

  • Online ISBN: 978-3-319-42294-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics