Cross-modal subspace learning via kernel correlation maximization and discriminative structure-preserving

Yu, Jun; Wu, Xiao-Jun

doi:10.1007/s11042-020-08989-1

Cross-modal subspace learning via kernel correlation maximization and discriminative structure-preserving

Published: 15 May 2020

Volume 79, pages 34647–34663, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

144 Accesses
3 Citations
Explore all metrics

Abstract

How to measure the distance between heterogeneous data is still an open problem. Many research works have been developed to learn a common subspace where the similarity between different modalities can be calculated directly. However, most of existing works focus on learning a latent subspace but the semantically structural information is not well preserved. Thus, these approaches cannot get desired results. In this paper, we propose a novel framework, termed Cross-modal subspace learning via Kernel correlation maximization and Discriminative structure-preserving (CKD), to solve this problem in two aspects. Firstly, we construct a shared semantic graph to make each modality data preserve the neighbor relationship semantically. Secondly, we introduce the Hilbert-Schmidt Independence Criteria (HSIC) to ensure the consistency between feature-similarity and semantic-similarity of samples. Our model not only considers the inter-modality correlation by maximizing the kernel correlation but also preserves the semantically structural information within each modality. The extensive experiments are performed to evaluate the proposed framework on the three public datasets. The experimental results demonstrate that the proposed CKD is competitive compared with the classic subspace learning methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

Sparse semi-supervised multi-label feature selection based on latent representation

Article Open access 17 April 2024

Robust zero-shot discrete hashing with noisy labels for cross-modal retrieval

Article 13 April 2024

SSCNet: learning-based subspace clustering

Article Open access 08 April 2024

References

Akaho S (2007) A kernel method for canonical correlation analysis. In: Proceedings of the International Meeting of the Psychometric Society
Andrew G, Arora R, Bilmes J, et al (2013) Deep canonical correlation analysis[C]//International conference on machine learning. 1247–1255
Chua TS, Tang J, Hong R, et al (2009) NUS-WIDE: a real-world web image database from National University of Singapore[C]//Proceedings of the ACM international conference on image and video retrieval. ACM, 48
Ciocca G, Marini D, Rizzi A, et al (2003) Retinex preprocessing of uncalibrated images for color-based image retrieval[J]. J Elect Imaging 12(1):161–172
Article Google Scholar
Davis JV, Kulis B, Jain P, et al (2007) Information-theoretic metric learning[C]//Proceedings of the 24th international conference on Machine learning. ACM, 209–216
Feng F, Wang X, Li R (2014) Cross-modal retrieval with correspondence autoencoder[C]//Proceedings of the 22nd ACM international conference on Multimedia. ACM, 7–16
Gong Y, Ke Q, Isard M, et al (2012) A Multi-View Embedding Space for Modeling Internet Images, Tags, and their Semantics[J]. Int J Comput Vis 106 (2):210–233
Article Google Scholar
Hardoon DR, Szedmak S, Shawe-Taylor J (2004) Canonical correlation analysis: An overview with application to learning methods[J]. Neural Comput 16(12):2639–2664
Article MATH Google Scholar
Hu M, Yang Y, Shen F, et al (2019) Collective Reconstructive Embeddings for Cross-Modal Hashing[J]. IEEE Trans Image Process 28(6):2770–2784
Article MathSciNet MATH Google Scholar
Huiskes MJ, Lew MS (2008) The MIR flickr retrieval evaluation[C]//Proceedings of the 1st ACM international conference on Multimedia information retrieval. ACM, 39–43
Jacobs DW, Daume H, Kumar A, et al (2012) Generalized Multiview analysis: A discriminative latent space[C]// 2012 IEEE Conference on Computer Vision and Pattern Recognition IEEE Computer Society
Jia Y, Salzmann M, Darrell T (2011) Learning cross-modality similarity for multinomial data[C]//2011 International Conference on Computer Vision. IEEE, 2407–2414
Jiang S, Song X, Huang Q (2014) Relative image similarity learning with contextual information for Internet cross-media retrieval[J]. Multi Syst 20(6):645–657
Article Google Scholar
Kim TK, Kittler J, Cipolla R (2007) Discriminative learning and recognition of image set classes using canonical correlations[J]. IEEE Trans Patt Anal Mach Intell 29(6):1005–1018
Article Google Scholar
Li Z, Tang J, Mei T (2018) Deep collaborative embedding for social image understanding[J] IEEE transactions on pattern analysis and machine intelligence
Liangli Z, Peng H, Xu W, et al (2019) Deep Supervised Cross-modal Retrieval[C]//Proceedings of the IEEE conference on computer vision and pattern recognition
Lin D, Tang X (2006) Inter-modality face recognition[C]//European conference on computer vision. Springer, Berlin, pp 13–26
Google Scholar
Lisanti G, Masi I, DelBimbo A (2014) Matching people across camera views using kernel canonical correlation analysis[C]//Proceedings of the International Conference on Distributed Smart Cameras. ACM, 10
Memon MH, Li JP, Memon I, et al (2017) GEO Matching regions: multiple regions of interests using content based image retrieval based on relative locations[J]. Multi Tools Appl 76(14):1–35
Google Scholar
Ngiam J, Khosla A, Kim M, et al (2011) Multimodal deep learning[C]//Proceedings of the 28th international conference on machine learning (ICML-11). 689–696
Nie F, Huang H, Cai X, et al (2010) Efficient and robust feature selection via joint ℓ_2,1-norms minimization[C]//Advances in neural information processing systems. 1813–1821
Peng Y, Huang X, Qi J (2016) Cross-Media Shared Representation by Hierarchical Learning with Multiple Deep Networks[C]//IJCAI. 3846–3853
Pereira JC, Coviello E, Doyle G, et al (2013) On the role of correlation and abstraction in cross-modal multimedia retrieval[J]. IEEE trans Patt Anal Mach Intell 36(3):521–535
Article Google Scholar
Principe JC (2010) Information theory, machine learning, and reproducing kernel Hilbert spaces[M]//Information theoretic learning. Springer, New York, pp 1–45
Google Scholar
Ranjan V, Rasiwasia N, Jawahar CV (2015) Multi-label cross-modal retrieval[C]//Proceedings of the IEEE International Conference on Computer Vision. 4094–4102
Rasiwasia N, Costa Pereira J, Coviello E et al (2010) A new approach to cross-modal multimedia retrieval[C]//Proceedings of the 18th ACM international conference on Multimedia. ACM, 251–260.
Sharma A, Jacobs DW (2011) Bypassing synthesis: PLS for face recognition with pose, low-resolution and sketch[C]//CVPR 2011. IEEE, 593–600
Shu X, Wu X (2011) A novel contour descriptor for 2D shape matching and its application to image retrieval[J]. Image Vision Comput 29(4):286–294
Article Google Scholar
Song G, Wang S, Huang Q, et al (2017) Multimodal similarity gaussian process latent variable model[J]. IEEE Trans Image Process 26(9):4168–4181
Article MathSciNet MATH Google Scholar
Song T, Cai J, Zhang T, et al (2017) Semi-supervised manifold-embedded hashing with joint feature representation and classifier learning[J]. Pattern Recogn 68:99–110
Article Google Scholar
Srivastava N, Salakhutdinov R (2012) Learning representations for multimodal data with deep belief nets[C]//International conference on machine learning workshop. 79
Tenenbaum JB, Freeman WT (2000) Separating style and content with bilinear models[J]. Neural Comput 12(6):1247–1283
Article Google Scholar
Wang B, Yang Y, Xu X, et al (2017) Adversarial cross-modal retrieval[C]//Proceedings of the 25th ACM international conference on Multimedia. ACM, 154–162
Wang D, Gao X, Wang X, et al (2018) Label consistent matrix factorization hashing for large-scale cross-modal similarity search[J] IEEE Transactions on Pattern Analysis and Machine Intelligence
Wang D, Wang Q, Gao X (2017) Robust and flexible discrete hashing for Cross-Modal similarity Search[J]. IEEE Trans Circuits Syst Video Technol 1–1
Wang H, Sahoo D, Liu C, et al (2019) Learning Cross-Modal Embeddings with Adversarial Networks for Cooking Recipes and Food Images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11572–11581
Wang K, He R, Wang L, et al (2015) Joint feature selection and subspace learning for cross-modal retrieval[J]. IEEE Trans Patt Anal Mach Intell 38(10):2010–2023
Article MathSciNet Google Scholar
Wei Y, Zhao Y, Lu C, et al (2017) Cross-modal retrieval with CNN visual features: A new baseline[J]. IEEE Trans Cyber 47(2):449–460
Google Scholar
Xu M, Zhu Z, Zhao Y, et al (2018) Subspace learning by kernel dependence maximization for cross-modal retrieval[J]. Neurocomputing 309:94–105
Article Google Scholar
Xu X, et al (2017) Learning Discriminative Binary Codes for Large-scale Cross-modal Retrieval. IEEE Trans Image Process 26(5):2494–2507
Article MathSciNet MATH Google Scholar
Yu J, Wu X, Kittler J (2018) Semi-supervised Hashing for Semi-Paired Cross-View Retrieval, 2018 24th International Conference on Pattern Recognition (ICPR), Beijing 958–963
Yu J, Wu XJ, Kittler J (2019) Discriminative Supervised Hashing for Cross-Modal Similarity Search[J]. Image Vision Comput 89:50–56
Article Google Scholar
Zhang C, Wang X, Feng J, et al (2017) A car-face region-based image retrieval method with attention of SIFT features[J]. Multi Tools Appl 76(8):1–20
Google Scholar
Zheng L, Wang S, Tian Q (2014) L_p-norm IDF for Scalable Image Retrieval[J]. Image Process IEEE Trans On 23(8):3604–3617
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

The paper is supported by the national natural science foundation of china(grant no.61672265,u1836218), and the 111 project of ministry of education of china (grant no. b12018).

Author information

Authors and Affiliations

The School of Artificial Intelligence and Computer Science, Jiangnan University, 214122, Wuxi, China
Jun Yu & Xiao-Jun Wu
The Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan University, 214122, Wuxi, China
Jun Yu & Xiao-Jun Wu

Authors

Jun Yu
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-Jun Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiao-Jun Wu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, J., Wu, XJ. Cross-modal subspace learning via kernel correlation maximization and discriminative structure-preserving. Multimed Tools Appl 79, 34647–34663 (2020). https://doi.org/10.1007/s11042-020-08989-1

Download citation

Received: 23 April 2019
Revised: 10 December 2019
Accepted: 23 April 2020
Published: 15 May 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s11042-020-08989-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cross-modal subspace learning via kernel correlation maximization and discriminative structure-preserving

Abstract

Access this article

Similar content being viewed by others

Sparse semi-supervised multi-label feature selection based on latent representation

Robust zero-shot discrete hashing with noisy labels for cross-modal retrieval

SSCNet: learning-based subspace clustering

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cross-modal subspace learning via kernel correlation maximization and discriminative structure-preserving

Abstract

Access this article

Similar content being viewed by others

Sparse semi-supervised multi-label feature selection based on latent representation

Robust zero-shot discrete hashing with noisy labels for cross-modal retrieval

SSCNet: learning-based subspace clustering

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation