Abstract
This paper employs Convolutional Neural Networks with pooling module to extract view descriptor of 3D model, and proposes the Group-Pair Deep Feature Learning method for multi-view 3D model retrieval. In the method, view descriptor is learned by the supervised autoencoder and multi-label discriminator to further mine the latent feature and category feature of 3D model. To enhance the discriminative capability of model features, we give the Margin Center Loss that minimizes the intra-class distance and maximize the inter-class distance. Experimental results on ModelNet10 and ModelNet40 datasets demonstrate that the proposed method significantly outperforms the state-of-the-art methods.
Similar content being viewed by others
References
Chen DY, Tian XP, Shen YT et al (2003) On visual similarity based 3D model retrieval[C]. In: Computer graphics forum, pp 223–232
Belongie S, Malik J, Puzicha J (2001) Shape context: A new descriptor for shape matching and object recognition[C]. In: Advances in neural information processing systems, pp 831–837
Gao Y, Yang Y, Dai Q et al (2010) 3D object retrieval with bag-of-region-words[C]. In: Proceedings of the 18th ACM international conference on Multimedia, pp 955–958
Bai S, Tang P et al (2019) Re-ranking via metric fusion for object retrieval and person re-identification[C]. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 740–749
Kanezaki A, Matsushita Y, Nishida Y (2018) Rotationnet: Joint object categorization and pose estimation using multiviews from unsupervised viewpoints[C]. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5010–5019
Yang Z, Wang L (2019) Learning relationships for multi-view 3D object recognition[C]. In: Proceedings of the IEEE International Conference on Computer Vision, pp 7505–7514
Boscaini D, Masci J, Rodolà E et al (2016) Learning shape correspondence with anisotropic convolutional neural networks[C]. In: Advances in neural information processing systems, pp 3189–3197
Maturana D, Scherer S (2015) Voxnet: A 3d convolutional neural network for real-time object recognition[C]. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp 922–928
Qi CR, Su H, Mo K et al (2017) Pointnet: Deep learning on point sets for 3d classification and segmentation[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 652–660
Qi CR, Yi L, Su H et al (2017) Pointnet++: Deep hierarchical feature learning on point sets in a metric space[C]. In: Advances in neural information processing systems, pp 5099–5108
Bai S, Bai X, Zhou Z et al (2016) Gift: A real-time and scalable 3d shape search engine[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5023–5032
Schmidhuber J (2015) Deep learning in neural networks: An overview[J]. Neural Netw 61:85–117
Furuya T, Ohbuchi R (2016) Deep Aggregation of Local 3D Geometric Features for 3D Model Retrieval[C]. In: BMVC, pp 8
Bai S, Zhou Z, Wang J et al (2017) Ensemble diffusion for retrieval[C]. In: Proceedings of the IEEE International Conference on Computer Vision, pp 774–783
Feng Y, Zhang Z, Zhao X et al (2018) GVCNN: Group-view convolutional neural networks for 3D shape recognition[C]. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 264–272
Zhou Y, Zeng F, Qian J et al (2019) 3D shape classification and retrieval based on polar view[J]. Inf Sci 474:205–220
Li Z, Xu C, Leng B (2019) Angular triplet-center loss for multi-view 3d shape retrieval[C]. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 8682–8689
Bronstein AM, Bronstein MM, Guibas LJ et al (2011) Shape google: Geometric words and expressions for invariant shape retrieval[J]. ACM Trans Graph (TOG) 30(1):1–20
Kokkinos I, Bronstein MM, Litman R et al (2012) Intrinsic shape context descriptors for deformable shapes[C]. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp 159–166
Chaudhuri S, Koltun V (2010) Data-driven suggestions for creativity support in 3D modeling[M]. ACM SIGGRAPH Asia 2010 papers 1–10
Knopp J, Prasad M, Willems G et al (2010) Hough transform and 3D SURF for robust three dimensional classification[C]. In: European Conference on Computer Vision, pp 2589–602
Shi B, Bai S, Zhou Z et al (2015) Deeppano: Deep panoramic representation for 3-d shape recognition[J]. IEEE Signal Process Lett 22(12):2339–2343
Papadakis P, Pratikakis I, Theoharis T et al (2010) PANORAMA: A 3D Shape descriptor based on panoramic views for unsupervised 3D object retrieval[J]. Int J Comput Vis 89(2-3):177–192
Gao Z, Wang D, He X et al (2018) Group-pair convolutional neural networks for multi-view based 3d object retrieval[C]. In: Thirty-Second AAAI Conference on Artificial Intelligence, pp 2223–2231
Qi CR, Su H, Nießner M et al (2016) Volumetric and multi-view cnns for object classification on 3d data[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5648–5656
Wu Z, Song S, Khosla A et al (2015) 3d shapenets: A deep representation for volumetric shapes[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1912–1920
Su H, Maji S, Kalogerakis E et al (2015) Multi-view convolutional neural networks for 3d shape recognition[C]. In: Proceedings of the IEEE international conference on computer vision, pp 945–953
Liu L, Zhang B, Zhang H et al (2019) Graph steered discriminative projections based on collaborative representation for Image recognition[J]. Multimed Tools Appl 78(17):24501–24518
Shang F, Zhang H, Sun J et al (2019) Semantic consistency cross-modal dictionary learning with rank constraint[J]. J Vis Commun Image Represent 62:259–266
Liu L, Chen S, Chen X et al (2019) Fuzzy weighted sparse reconstruction error-steered semi-supervised learning for face recognition[J]. Vis Comput:1–14
Shang F, Zhang H, Zhu L et al (2019) Adversarial cross-modal retrieval based on dictionary learning[J]. Neurocomputing 355: 93–104
Liu H, Xu B, Lu D et al (2018) A path planning approach for crowd evacuation in buildings based on improved artificial bee colony algorithm[J]. Appl Soft Comput 68:360–376
Liu H, Liu B, Zhang H et al (2018) Crowd evacuation simulation approach based on navigation knowledge and two-layer control mechanism[J]. Inf Sci 436–437:247–267
Zhang M, Li J, Zhang H et al (2020) Deep semantic cross modal hashing with correlation alignment[J]. Neurocomputing 381: 240–251
Cui H, Zhu L, Li J et al (2020) Scalable deep hashing for Large-Scale social image Retrieval[J]. IEEE Trans Image Process 29:1271–1284
Lu X, Zhu L, Cheng ZY et al (2019) Online Multi-modal Hashing with Dynamic Query-adaption[J]. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 715–724
Acknowledgements
The work is supported by the National Natural Science Foundation of China (Nos.61702310, 62076153), the major fundamental research project of Shandong, China (No.ZR2019ZD03), and the Taishan Scholar Project of Shandong, China.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, X., Liu, L., Zhang, L. et al. Group-pair deep feature learning for multi-view 3d model retrieval. Appl Intell 52, 2013–2022 (2022). https://doi.org/10.1007/s10489-021-02471-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02471-7