Abstract
Image retrieval is an attractive task in computer vision that aims at browsing, searching, and returning images from a large database of digital images after delivering a retrieval query. Numerous works have focused on fine-grained object retrieval (FGOR) because it is extremely challenging and of great value in practical application. Due to the large diversity within a class and the small diversity across different classes of fine-grained objects data, a convolutional neural network (CNN) is a powerful extractor that can be used to obtain fine-grained features for distinguishing tiny variations between classes. As an indispensable part of a convolutional neural network model, the loss function is of critical importance for feature extraction. In this work, based on the global structure loss function, we propose a variant of softmax loss, named switched shifted softmax loss, to potentially reduce the overfitting phenomenon of the model. Comparative experiments with different backbone structures verify that the developed loss function with trivial transformation enhances the fine-grained retrieval performance of deep learning methods1. Furthermore, additional experiments of fine-grained object classification and person re-identification (re-ID) prove that our method has a wide spectrum of applicability to other tasks.
Similar content being viewed by others
Notes
References
Bell S, Bala K (2015) Learning visual similarity for product design with convolutional neural networks. ACM Trans Graph (TOG) 34(4):98
Deng C, Liu X, Mu Y, Li J (2015) Large-scale multi-task image labeling with adaptive relevance discovery and feature hashing. Signal Process 112:137–145
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on computer vision and pattern recognition. IEEE, pp 248–255
Dubey A, Gupta O, Guo P, Raskar R, Farrell R, Naik N (2018) Pairwise confusion for fine-grained visual classification. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 70–86
Dubey A, Gupta O, Raskar R, Naik N (2018) Maximum-entropy fine grained classification. In: Advances in neural information processing systems, pp 637–647
Golik P, Doetsch P, Ney H (2013) Cross-entropy vs. squared error training: a theoretical and experimental comparison. In: Interspeech, vol 13, pp 1756–1760
Gudivada VN, Raghavan VV (1995) Content based image retrieval systems. Computer 28(9):18–22
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2961–2969
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hoi SC, Liu W, Chang SF (2010) Semi-supervised distance metric learning for collaborative image retrieval and clustering. ACM Trans Multimed Comput Commun Appl (TOMM) 6(3):1–26
Huang C, Loy CC, Tang X (2016) Local similarity-aware deep feature embedding. In: Advances in neural information processing systems, pp 1262–1270
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Jain AK, Vailaya A (1996) Image retrieval using color and shape. Pattern Recogn 29 (8):1233–1244
Khosla A, Jayadevaprakash N, Yao B, Li FF (2011) Novel dataset for fine-grained image categorization: Stanford dogs. In: Proc. CVPR workshop on fine-grained visual categorization (FGVC), vol 2
Krause J, Stark M, Deng J (2013) Fei-fei, L.: 3d object representations for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp 554–561
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Li C, Deng C, Wang L, Xie D, Liu X (2019) Coupled cyclegan: Unsupervised hashing network for cross-modal retrieval. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 176–183
Liu W, Wang J, Ji R, Jiang YG, Chang SF (2012) Supervised hashing with kernels. In: 2012 IEEE Conference on computer vision and pattern recognition. IEEE, pp 2074–2081
Liu Z, Li H, Zhou W, Zhao R, Tian Q (2014) Contextual hashing for large-scale image search. IEEE Trans Image Process 23(4):1606–1614
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60 (2):91–110
Maji S, Kannala J, Rahtu E, Blaschko M, Vedaldi A (2013) Fine-grained visual classification of aircraft. Technical report
Nilsback M, Zisserman A (2006) A visual vocabulary for flower classification. In: 2006 IEEE Computer society conference on computer vision and pattern recognition (CVPR’06), vol 2, pp 1447–1454. https://doi.org/10.1109/CVPR.2006.42
Oh Song H, Jegelka S, Rathod V, Murphy K (2017) Deep metric learning via facility location. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5382–5390
Oh Song H, Xiang Y, Jegelka S, Savarese S (2016) Deep metric learning via lifted structured feature embedding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4004–4012
Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971– 987
Radenović F, Tolias G, Chum O (2018) Fine-tuning cnn image retrieval with no human annotation. IEEE Trans Pattern Anal Mach Intell 41(7):1655–1668
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823
Shi W, Gong Y, Tao X, Cheng D, Zheng N (2018) Fine-grained image classification using modified DCNNs trained by cascaded softmax and generalized large-margin losses. IEEE Trans Neural Netw Learn Syst 30(3):683–694
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Proceedings of the IEEE International Conference on Learning Representations
Sohn K (2016) Improved deep metric learning with multi-class n-pair loss objective. In: Advances in neural information processing systems, pp 1857–1865
Su X, Liu Z, Zhang Y, Chen CP (2019) Event-triggered adaptive fuzzy tracking control for uncertain nonlinear systems preceded by unknown Prandtl-Ishlinskii hysteresis. IEEE Trans Cybern 51 (6):2979–2992
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
Ustinova E, Lempitsky V (2016) Learning deep embeddings with histogram loss. In: Advances in neural information processing systems, pp 4170–4178
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset
Wang H, Wang Y, Zhou Z, Ji X, Gong D, Zhou J, Li Z, Liu W (2018) Cosface: Large margin cosine loss for deep face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5265–5274
Wang J, Song Y, Leung T, Rosenberg C, Wang J, Philbin J, Chen B, Wu Y (2014) Learning fine-grained image similarity with deep ranking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1386–1393
Wei L, Zhang S, Gao W, Tian Q (2018) Person transfer gan to bridge domain gap for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 79–88
Wei XS, Luo J, Wu J, Zhou ZH (2017) Selective convolutional descriptor aggregation for fine-grained image retrieval. IEEE Trans Image Process 26(6):2868–2881
Wei XS, Wu J, Cui Q (2019) Deep learning for fine-grained image analysis: A survey. arXiv:1907.03069
Xie L, Wang J, Zhang B, Tian Q (2015) Fine-grained image search. IEEE Trans Multimed 17(5):636–647
Xu B, Bu J, Chen C, Cai D, He X, Liu W, Luo J (2011) Efficient manifold ranking for image retrieval. In: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, pp 525–534
Yi D, Lei Z, Li S (2014) Deep metric learning for practical person re-identification. ArXiv e-prints
Yuan L, Wang T, Zhang X, Tay FE, Jie Z, Liu W, Feng J (2020) Central similarity quantization for efficient image and video retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3083–3092
Yuan X, Yu J, Qin Z, Wan T (2011) A sift-lbp image retrieval model based on bag of features. In: IEEE International conference on image processing, pp 1061–1064
Zeng X, Wang X, Chen K, Zhang Y, Li D (2019) Dividing the neighbors is not enough: adding confusion makes local descriptor stronger. IEEE Access 7:136106–136115
Zeng X, Zhang Y, Wang X, Chen K, Li D, Yang W (2020) Fine-grained image retrieval via piecewise cross entropy loss. Image Vis Comput 93:103820
Zhang S, Yang M, Wang X, Lin Y, Tian Q (2015) Semantic-aware co-indexing for image retrieval. IEEE Trans Pattern Anal Mach Intell 37(12):2573–2587
Zhang X, Zhou F, Lin Y, Zhang S (2016) Embedding label structures for fine-grained feature representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1114–1123
Zheng L, Wang S, Tian Q (2014) Coupled binary embedding for large-scale image retrieval. IEEE Trans Image Process 23(8): 3368–3380
Zheng L, Shen L, Tian L, Wang S, Wang J, Tian Q (2015) Scalable person re-identification: a benchmark. In: Proceedings of the IEEE international conference on computer vision, pp 1116–1124
Zheng X, Ji R, Sun X, Wu Y, Huang F, Yang Y (2018) Centralized ranking loss with weakly supervised localization for fine-grained object retrieval. In: IJCAI, pp 1226–1233
Zheng X, Ji R, Sun X, Zhang B, Wu Y, Huang F (2019) Towards optimal fine grained retrieval via decorrelated centralized loss with normalize-scale layer
Zheng Z, Zheng L, Yang Y (2017) Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3754–3762
Zhou K, Xiang T (2019) Torchreid: A library for deep learning person re-identification in pytorch. arXiv:1910.10093
Zhou K, Yang Y, Cavallaro A, Xiang T (2019) Omni-scale feature learning for person re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 3702–3712
Acknowledgments
This work was supported by the Ph.D. Start-up Fund of Guangdong Polytechnic Normal University (991641258 and 991641231), Guangzhou Science and Technology Program (105130372030), the National Natural Science Foundation of China (61803090), the Natural Science Foundation of Guangdong Province (2019A1515012109). We appreciate Prof. Rongjun Chen for his professional advice for our work. We thank the associate editor and all the reviewers for their time and evaluation on our work, which is very helpful for us to improve the quality and presentation of our paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The code is available at https://github.com/Zengxianxian727/FGOR
Rights and permissions
About this article
Cite this article
Zeng, X., Liu, S., Wang, X. et al. SSCRL: fine-grained object retrieval with switched shifted centralized ranking loss. Appl Intell 53, 336–350 (2023). https://doi.org/10.1007/s10489-022-03287-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03287-9