Deep Relative Attributes

Souri, Yaser; Noury, Erfan; Adeli, Ehsan

doi:10.1007/978-3-319-54193-8_8

Yaser Souri¹⁷,
Erfan Noury¹⁸ &
Ehsan Adeli¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10115))

Included in the following conference series:

Asian Conference on Computer Vision

3850 Accesses
11 Citations

Abstract

Visual attributes are great means of describing images or scenes, in a way both humans and computers understand. In order to establish a correspondence between images and to be able to compare the strength of each property between images, relative attributes were introduced. However, since their introduction, hand-crafted and engineered features were used to learn increasingly complex models for the problem of relative attributes. This limits the applicability of those methods for more realistic cases. We introduce a deep neural network architecture for the task of relative attribute prediction. A convolutional neural network (ConvNet) is adopted to learn the features by including an additional layer (ranking layer) that learns to rank the images based on these features. We adopt an appropriate ranking loss to train the whole network in an end-to-end fashion. Our proposed method outperforms the baseline and state-of-the-art methods in relative attribute prediction on various coarse and fine-grained datasets. Our qualitative results along with the visualization of the saliency maps show that the network is able to learn effective features for each specific attribute. Source code of the proposed method is available at https://github.com/yassersouri/ghiaseddin.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Kovashka, A., Parikh, D., Grauman, K.: Whittlesearch: image search with relative attribute feedback. In: CVPR (2012)
Google Scholar
Branson, S., Beijbom, O., Belongie, S.: Efficient large-scale structured learning. In: CVPR (2013)
Google Scholar
Branson, S., Wah, C., Schroff, F., Babenko, B., Welinder, P., Perona, P., Belongie, S.: Visual recognition with humans in the loop. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 438–451. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15561-1_32
Chapter Google Scholar
Lampert, C., Nickisch, H., Harmeling, S.: Attribute-based classification for zero-shot visual object categorization. IEEE TPAMI 36, 453–465 (2014)
Article Google Scholar
Parikh, D., Grauman, K.: Relative attributes. In: CVPR, pp. 503–510 (2011)
Google Scholar
Ferrari, V., Zisserman, A.: Learning visual attributes. In: NIPS, pp. 433–440 (2007)
Google Scholar
Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: CVPR (2009)
Google Scholar
Oliva, A., Torralba, A.: Modeling the shape of the scene: A holistic representation of the spatial envelope. IJCV 42, 145–175 (2001)
Article MATH Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, pp. 886–893 (2005)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR, pp. 3431–3440 (2015)
Google Scholar
Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: CVPRW, pp. 512–519 (2014)
Google Scholar
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013)
Farhadi, A., Hejrati, M., Sadeghi, M.A., Young, P., Rashtchian, C., Hockenmaier, J., Forsyth, D.: Every picture tells a story: generating sentences from images. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 15–29. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15561-1_2
Chapter Google Scholar
Tao, R., Smeulders, A.W., Chang, S.F.: Attributes and categories for generic instance search from one example. In: CVPR, pp. 177–186 (2015)
Google Scholar
Khan, F., van de Weijer, J., Anwer, R., Felsberg, M., Gatta, C.: Semantic pyramids for gender and action recognition. IEEE TIP 23, 3633–3645 (2014)
MathSciNet Google Scholar
Liu, J., Kuipers, B., Savarese, S.: Recognizing human actions by attributes. In: CVPR, pp. 3337–3344 (2011)
Google Scholar
Liu, J., Yu, Q., Javed, O., Ali, S., Tamrakar, A., Divakaran, A., Cheng, H., Sawhney, H.: Video event recognition using concept attributes. In: WACV, pp. 339–346 (2013)
Google Scholar
Kovashka, A., Grauman, K.: Attribute pivots for guiding relevance feedback in image search. In: ICCV, pp. 297–304 (2013)
Google Scholar
Joachims, T.: Optimizing search engines using clickthrough data. In: ACM KDD, pp. 133–142 (2002)
Google Scholar
Li, S., Shan, S., Chen, X.: Relative forest for attribute prediction. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 316–327. Springer, Heidelberg (2013). doi:10.1007/978-3-642-37331-2_24
Chapter Google Scholar
Datta, A., Feris, R., Vaquero, D.: Hierarchical ranking of facial attributes. In: FG, pp. 36–42 (2011)
Google Scholar
Jayaraman, D., Sha, F., Grauman, K.: Decorrelating semantic visual attributes by resisting the urge to share. In: CVPR, pp. 1629–1636 (2014)
Google Scholar
Zhang, H., Berg, A., Maire, M., Malik, J.: SVM-KNN: discriminative nearest neighbor classification for visual category recognition. In: CVPR, vol. 2, pp. 2126–2136 (2006)
Google Scholar
Yu, A., Grauman, K.: Fine-grained visual comparisons with local learning. In: CVPR (2014)
Google Scholar
Yu, A., Grauman, K.: Just noticeable differences in visual attributes. In: ICCV (2015)
Google Scholar
LeCun, Y., Boser, B.E., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W.E., Jackel, L.D.: Handwritten digit recognition with a back-propagation network. In: NIPS (1989)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587 (2014)
Google Scholar
Zhang, N., Paluri, M., Ranzato, M., Darrell, T., Bourdev, L.: PANDA: pose aligned networks for deep attribute modeling. In: CVPR, pp. 1637–1644 (2014)
Google Scholar
Escorcia, V., Carlos Niebles, J., Ghanem, B.: On the relationship between visual attributes and convolutional networks. In: CVPR (2015)
Google Scholar
Shankar, S., Garg, V.K., Cipolla, R.: Deep-carving: discovering visual attributes by carving deep neural nets. In: CVPR (2015)
Google Scholar
Khan, F.S., Anwer, R.M., Weijer, J., Felsberg, M., Laaksonen, J.: Deep semantic pyramids for human attributes and action recognition. In: Paulsen, R.R., Pedersen, K.S. (eds.) SCIA 2015. LNCS, vol. 9127, pp. 341–353. Springer, Heidelberg (2015). doi:10.1007/978-3-319-19665-7_28
Chapter Google Scholar
Huang, J., Feris, R.S., Chen, Q., Yan, S.: Cross-domain image retrieval with a dual attribute-aware ranking network. In: ICCV (2015)
Google Scholar
Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G.: Learning to rank using gradient descent. In: ICML, pp. 89–96 (2005)
Google Scholar
Song, Y., Wang, H., He, X.: Adapting deep ranknet for personalized search. In: WSDM (2014)
Google Scholar
Wan, J., Wang, D., Hoi, S.C.H., Wu, P., Zhu, J., Zhang, Y., Li, J.: Deep learning for content-based image retrieval: a comprehensive study. In: ACM MM, pp. 157–166 (2014)
Google Scholar
Yao, T., Mei, T., Rui, Y.: Highlight detection with pairwise deep ranking for first-person video summarization. In: CVPR (2016)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR (2015)
Google Scholar
Sandeep, R.N., Verma, Y., Jawahar, C.V.: Relative parts: distinctive parts for learning relative attributes. In: CVPR (2014)
Google Scholar
Dieleman, S., Schlter, J., Raffel, C., Olson, E., Snderby, S.K., Nouri, D., Maturana, D., Thoma, M., Battenberg, E., Kelly, J., Fauw, J.D., Heilman, M., diogo149, McFee, B., Weideman, H., takacsg84, peterderivaz, Jon, instagibbs, Rasul, D.K., CongLiu, Britefury, Degrave, J.: Lasagne: first release (2015)
Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115, 211–252 (2015)
Article MathSciNet Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: AISTATS, pp. 249–256 (2010)
Google Scholar
Tieleman, T., Hinton, G.: Lecture 6.5–RmsProp: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw. Mach. Learn. (2012)
Google Scholar
Verma, Y., Jawahar, C.V.: Exploring locally rigid discriminative patches for learning relative attributes. In: BMVC (2015)
Google Scholar
Xiao, F., Jae Lee, Y.: Discovering the spatial extent of relative attributes. In: CVPR (2015)
Google Scholar
Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. JMLR 9, 85 (2008)
MATH Google Scholar

Download references

Acknowledgments

We would like to thank Computer Engineering Department of Sharif University of Technology and HPC center of IPM for their support with computational resources.

Author information

Authors and Affiliations

Sobhe, Tehran, Iran
Yaser Souri
Sharif University of Technology, Tehran, Iran
Erfan Noury
University of North Carolina at Chapel Hill, Chapel Hill, USA
Ehsan Adeli

Authors

Yaser Souri
View author publications
You can also search for this author in PubMed Google Scholar
Erfan Noury
View author publications
You can also search for this author in PubMed Google Scholar
Ehsan Adeli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yaser Souri .

Editor information

Editors and Affiliations

National Tsing Hua University, Hsinchu, Taiwan
Shang-Hong Lai
Graz University of Technology, Graz, Austria
Vincent Lepetit
Drexel University, Philadelphia, Pennsylvania, USA
Ko Nishino
The University of Tokyo, Tokyo, Japan
Yoichi Sato

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Souri, Y., Noury, E., Adeli, E. (2017). Deep Relative Attributes. In: Lai, SH., Lepetit, V., Nishino, K., Sato, Y. (eds) Computer Vision – ACCV 2016. ACCV 2016. Lecture Notes in Computer Science(), vol 10115. Springer, Cham. https://doi.org/10.1007/978-3-319-54193-8_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-54193-8_8
Published: 11 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54192-1
Online ISBN: 978-3-319-54193-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics