A Novel Spatial Layout Representation for Object Recognition

Elfiky, Noha

doi:10.1007/978-3-030-44289-7_52

Noha Elfiky¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1153))

Included in the following conference series:

The International Conference on Artificial Intelligence and Computer Vision

3171 Accesses
1 Citations

Abstract

Most successful approaches on image classification apply the Bag-of-Words (BoW) approach in the context of category-level image classification. To incorporate spatial image information in the BoW model, Spatial Pyramids (SPs) are used. However, spatial pyramids are rigid in nature and are based on predefined grid configurations. As a consequence, they often fail to coincide with the underlying spatial structure of images from different categories which may negatively affect the classification accuracy.

This study proposes a technique that addresses this problem by presenting a selective Spatial Pyramid (SP) representation for automatically learning the most appropriate shape for each category. The proposed approach provides an image representation by inferring the constituent geometrical parts. As a result, the image representation retains the descriptive spatial information to yield a structural description of the image. From large scale experiments on the Pascal VOC2007 and Caltech101, it can be derived that SPs which are obtained by selective search outperforms the standard SPs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Dance, C.R., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: European Conference on Computer Vision Workshop on Statistical Learning in Computer Vision (2004)
Google Scholar
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. Trans. Pattern Anal. Mach. Intell. 27(10), 1615–1630 (2005)
Article Google Scholar
Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: Computer Vision and Pattern Recognition (2005)
Google Scholar
Zhang, J., Marszalek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: an in-depth study: a comprehensive study. Int. J. Comput. Vis. 73(2), 213–218 (2007)
Article Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Computer Vision and Pattern Recognition (2006)
Google Scholar
Everingham, M., Gool, L.V., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes challenge 2007 results (2007)
Google Scholar
Nedovic, V., Smeulders, A.W.M., Redert, A., Geusebroek, J.-M.: Stages as models of scene geometry. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1673–1687 (2010)
Article Google Scholar
Slonim, N., Tishby, N.: Agglomerative information bottleneck. In: Neural Information Processing Systems (1999)
Google Scholar
Fulkerson, B., Vedaldi, A., Soatto, S.: Localizing objects with smart dictionaries. In: European Conference of Computer Vision (2008)
Google Scholar
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples. In: Computer Vision on Pattern Recognition Workshop (2004)
Google Scholar
Hoiem, D., Efros, A.A., Hebert, M.: Geometric context from a single image. In: International Conference on Computer Vision, pp. 654–661 (2005)
Google Scholar
Delage, E., Lee, H., Ng, A.Y.: A dynamic bayesian network model for autonomous 3D reconstruction from a single indoor image. In: Computer Vision and Pattern Recognition, pp. 2418–2428 (2006)
Google Scholar
Sudderth, E.B., Torralba, A., Freeman, W.T., Willsky, A.S.: Depth from familiar objects: a hierarchical model for 3D scenes. In: Computer Vision on Pattern Recognition, pp. 2410–2417 (2006)
Google Scholar
Nedovic, V., Smeulders, A., Redert, A., Geusebroek, J.-M.: Depth information by stage classification. In: International Conference on Computer Vision (2007)
Google Scholar
Marszalek, M., Schmid, C., Harzallah, H., van de Weijer, J.: Learning object representation for visual object class recognition. In: International Conference on Computer Vision on Visual recognition Challenge Workshop (2007)
Google Scholar
van Gemert, J.: Exploiting photographic style for category-level image classification by generalizing the spatial pyramid. In: International Conference on Machine Learning (2011)
Google Scholar
Grauman, K., Darrell, T.: The pyramid match kernel: discriminative classification with sets of image features. In: International Conference on Computer Vision (2005)
Google Scholar
Rui, L., Gijsenij, A., Gevers, T., Nedovic, V., De, X., Geusebroek, J.: Color constancy using 3D scene geometry. In: International Conference on Computer Vision (2009)
Google Scholar
Moore, A.P., Prince, S.J.D., Warrell, J., Mohammed, U., Jones, G., Superpixel lattices. In: Conference on Computer Vision and Pattern Recognition (2008)
Google Scholar
Lowe, D.: Distinctive image features from scale invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Gehler, P.V., Nowozin, S.: On feature combination for multiclass object classification. In: International Conference on Computer Vision (2009)
Google Scholar
Xiao, J., Hays, J., Ehinger, K., Oliva, A., Torralba, A.: Largescale scene recognition from abbey to zoo. In: Computer Vision and Pattern Recognition (2010)
Google Scholar
Khan, F., van de Weijer, J., Vanrell, M.: Top-down color attention for object recognition. In: International Conference on Computer Vision (2009)
Google Scholar
Su, Y., Jurie, F.: Visual word disambiguation by semantic contexts. In: International Conference on Computer Vision (2011)
Google Scholar
Sharma, G., Jurie, F.: Learning discriminative spatial representation for image classification. In: British Machine Vision Conference (2011)
Google Scholar
Boureau, Y., Bach, F., LeCun, Y., Ponce, J.: Learning midlevel features for recognition. In: Computer Vision and Patten Recognition (2010)
Google Scholar
Zhang, H., Berg, A.C., Maire, M., Malik, J.: SVM-KNN: discriminative nearest neighbor classification for visual category recognition. In: Computer Vision and Pattern Recognition (2006)
Google Scholar
Boiman, O., Rehovot, I., Shechtman, E., Irani, M.: In defense of nearest-neighbor based image classification. In: Computer Vision and Pattern Recognition (2008)
Google Scholar
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: Computer Vision and Pattern Recognition (2009)
Google Scholar
van de Sande, K.E.A., Gevers, T., Snoek, C.G.M.: Evaluating color descriptors for object and scene recognition. Trans. Pattern Anal. Mach. Intell. 32(9), 1582–1596 (2010)
Article Google Scholar
Elfiky, N.: Application of analytics in machine vision using Big Data. Asian J. Appl. Sci. 7(4), 376–385 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Saint Mary’s College of California, Moraga, CA, USA
Noha Elfiky

Authors

Noha Elfiky
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Noha Elfiky .

Editor information

Editors and Affiliations

Information Technology Department, Cairo University, Faculty of Computers and Information, Giza, Egypt
Aboul-Ella Hassanien
Faculty of Computers and Information, Benha University, Banha, Egypt
Ahmad Taher Azar
Faculty of Computers and Information, Suez Canal University, Ismailia, Egypt
Tarek Gaber
Departamento de Ciencias Computacionales, Universidad de Guadalajara, CUCEI, Guadajalara, Jalisco, Mexico
Diego Oliva
Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt
Fahmy M. Tolba

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Elfiky, N. (2020). A Novel Spatial Layout Representation for Object Recognition. In: Hassanien, AE., Azar, A., Gaber, T., Oliva, D., Tolba, F. (eds) Proceedings of the International Conference on Artificial Intelligence and Computer Vision (AICV2020). AICV 2020. Advances in Intelligent Systems and Computing, vol 1153. Springer, Cham. https://doi.org/10.1007/978-3-030-44289-7_52

Download citation

DOI: https://doi.org/10.1007/978-3-030-44289-7_52
Published: 24 March 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-44288-0
Online ISBN: 978-3-030-44289-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics