Skip to main content

Beyond SIFT for Image Categorization by Bag-of-Scenes Analysis

  • Conference paper
  • First Online:
Pattern Recognition Applications and Methods

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 318))

Abstract

In this paper, we address the general problem of image/object categorization with a novel approach referred to as Bag-of-Scenes (BoS). Our approach is efficient for both low semantic applications, such as texture classification and higher semantic tasks such as natural scenes recognition. It is based on the widely used combination of (i) Sparse coding (Sc), (ii) Max-pooling and (iii) Spatial Pyramid Matching (SPM) techniques applied to histograms of multi-scale Local Binary/Ternary Patterns (LBP/LTP) as local features. This approach can be considered as a two-layer hierarchical architecture. The first layer encodes quickly the local spatial patch structure via histograms of LBP/LTP, while the second layer encodes the relationships between pre-analyzed LBP/LTP-scenes/objects. In order to provide comparative results, we also introduce an alternate 2-layer architecture. For this latter, the first layer is encoding directly the multi-scale Differential Vectors (DV) local patches instead of histograms of LBP/LTP. Our method outperforms SIFT-based approaches using Sc techniques and can be trained efficiently with a simple linear SVM. Our BoS method achieves \(87.46\,\%\), and \(90.35\,\%\) of accuracy for Scene-15, UIUC-Sport datasets respectively.

Granded by COGNILEGO ANR 2010-CORD-013 and PEPS RUPTURE Scale Swarm Vision.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    \(1\!\!1_{\{x\}}=1\) if event \(x\) is true, \(0\) otherwise.

  2. 2.

    LSc requiers to store sparse codes of the template set, i.e, a sparse matrix \((K\times N_{\textit{template}})\).

References

  1. Bosch, A., Zisserman, A., Munoz, X.: Image classification using random forests and ferns. In: ICCV’07 (2007)

    Google Scholar 

  2. Larios, N., Lin, J., Zhang, M., Lytle, D., Moldenke, A., Shapiro, L., Dietterich, T.: Stacked spatial-pyramid kernel: an object-class recognition method to combine scores from random trees. In: WACV’11 (2011)

    Google Scholar 

  3. Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 145–175 (2001)

    Article  MATH  Google Scholar 

  4. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR’05 (2005)

    Google Scholar 

  5. Deselaers, T., Ferrari, V.: Global and efficient self-similarity for object classification and detection. In: CVPR’10 (2010)

    Google Scholar 

  6. Chen, J., Shan, S., He, C., Zhao, G., Pietikainen, M., Chen, X., Gao, W.: Wld: a robust local image descriptor. IEEE Trans. PAMI 32(9), 1705–1720 (2010)

    Article  Google Scholar 

  7. Fröba, B., Ernst, A.: Face detection with the modified census transform. In: FGR’04 (2004)

    Google Scholar 

  8. Wu, J., Geyer, C., Rehg, J.M.: Real-time human detection using contour cues. In: ICRA’11 (2011)

    Google Scholar 

  9. Marcel, S., Rodriguez, Y., Heusch, G.: On the recent use of local binary patterns for face authentication. Int. J. Image Video Process. Spec. Issue Facial Image Process. 1–9 (2007)

    Google Scholar 

  10. Zhang, L., Chu, R., Xiang, S., Liao, S., Li, S.Z.: Face detection based on multi-block lbp representation. In: ICB’07 (2007)

    Google Scholar 

  11. Sadat, R.M.N., Teng, S.W., Lu, G., Hasan, S.F.: Texture classification using multimodal invariant local binary pattern. In: WACV’11 (2011)

    Google Scholar 

  12. Bianconi, F., González, E., Fernández, A., Saetta, S.A.: Automatic classification of granite tiles through colour and texture features. Expert Syst. Appl. 39(12), 11212–11218 (2012)

    Article  Google Scholar 

  13. Wu, J., Rehg, J.M.: Where am i: place instance and category recognition using spatialpact. In: CVPR’2008 (2008)

    Google Scholar 

  14. Gao, S., Tsang, I.W.-H., Chia, L.-T., Zhao, P.: Local features are not lonely Laplacian sparse coding for image classification. In: CVPR’10 (2010)

    Google Scholar 

  15. Paris, S., Glotin, H.: Pyramidal multi-level features for the robot vision@icpr 2010 challenge. In: ICPR’10 (2010)

    Google Scholar 

  16. Zhang, B., Gao, Y., Zhao, S., Liu, J.: Local derivative pattern versus local binary pattern: face recognition with high-order local pattern descriptor. IEEE Trans. Image Proc. 19(2), 533–544 (2010)

    Article  MathSciNet  Google Scholar 

  17. Ojala, T., Pietikäinen, M., Mäenpää, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. PAMI 24(7), 971–987 (2002)

    Article  Google Scholar 

  18. Zheng, Y., Shen, C., Hartley, R.I., Huang, X.: Effective pedestrian detection using center-symmetric local binary/trinary patterns. In: CoRR, vol. abs/1009.0892 (2010)

  19. Zhang, W., Shan, S., Qing, L., Chen, X., Gao, W.: Are gabor phases really useless for face recognition? Pattern Anal. Appl. 12(3), 301–307 (2009)

    Article  MathSciNet  Google Scholar 

  20. Lee, H., Chung, Y., Kim, J., Park, D.: Face image retrieval using sparse representation classifier with gabor-lbp histogram. In: WISA’10 (2010)

    Google Scholar 

  21. Jun, B., Kim, D.: Robust face detection using local gradient patterns and evidence accumulation. Pattern Recognit. 45, 3304–3316 (2012)

    Article  Google Scholar 

  22. Hussain, S.U., Triggs, W.: Visual recognition using local quantized patterns. In: CVPR’12 (2012)

    Google Scholar 

  23. Heikkilä, M., Pietikäinen, M., Schmid, C.: Description of interest regions with center-symmetric local binary patterns. In: CVGIP’06 (2006)

    Google Scholar 

  24. Huang, D., Shan, C., Ardabilian, M., Wang, Y., Chen, L.: Local binary patterns and its application to facial image analysis: a survey. IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 41, 1–17 (2011)

    Article  Google Scholar 

  25. Bianconi, F., Fernández, A.: On the occurrence probability of local binary patterns: a theoretical study. J. Math. Imaging Vis. 40(3), 259–268 (2011)

    Article  MATH  Google Scholar 

  26. Tan, X., Triggs, B.: Enhanced local texture feature sets for face recognition under difficult lighting conditions. Trans. Image Proc. 19, 1635–1650 (2010)

    Article  MathSciNet  Google Scholar 

  27. Willamowski, J., Arregui, D., Csurka, G., Dance, C.R., Fan, L.: Categorizing nine visual classes using local appearance descriptors. In: ICPR’04 (2004)

    Google Scholar 

  28. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR’06 (2006)

    Google Scholar 

  29. Lowe, D.G.: Object recognition from local scale-invariant features. In: ICCV’99 (2009)

    Google Scholar 

  30. Yang, J., Yu, K., Gong, Y., Huang, T.S.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR’09 (2009)

    Google Scholar 

  31. Sermanet, P., Chintala, S., LeCun, Y.: Convolutional neural networks applied to house numbers digit classification. In: ICPR’12 (2012)

    Google Scholar 

  32. Wu, J., Rehg, J.: Beyond the euclidean distance: creating effective visual codebooks using the histogram intersection kernel. In: ICCV’09 (2009)

    Google Scholar 

  33. Boiman, O., Shechtman, E., Irani, M.: In defense of nearest-neighbor based image classification. In: CVPR’08 (2008)

    Google Scholar 

  34. Avila, S.E.F., Thome, N., Cord, M., Valle, E., de Albuquerque Araújo, A.: Bossa: extended bow formalism for image classification. In: ICIP’11 (2011)

    Google Scholar 

  35. Oliveira, G.L., Nascimento, E.R., Viera, A.W., Campos, M.F.M.: Sparse spatial coding: a novel approach for efficient and accurate object recognition. In: ICRA’12 (2012)

    Google Scholar 

  36. Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: ECCV’10 (2010)

    Google Scholar 

  37. Krapac, J., Verbeek, J., Jurie, F.: Modeling spatial layout with fisher vectors for image categorization. In: ICCV’11 (2011)

    Google Scholar 

  38. Bo, L., Ren, X., Fox, D.: Hierarchical matching pursuit for image classification: architecture and fast algorithms. In: NIPS’11, pp. 2115–2123 (2011)

    Google Scholar 

  39. Shalev-Shwartz, S., Singer, Y., Srebro, N., Cotter, A.: Pegasos: primal estimated sub-gradient solver for svm (2007)

    Google Scholar 

  40. Hsieh, C., Chang, K., Lin, C., Keerthi, S.: A dual coordinate descent method for large-scale linear svm (2008)

    Google Scholar 

  41. Liao, S., Zhu, X., Lei, Z., Zhang, L., Li, S.Z.: Learning multi-scale block local binary patterns for face recognition. In: ICB (2007)

    Google Scholar 

  42. Viola, P., Jones, M.: Robust real-time face detection. Int. J. Comput. Vis. 57, 137–154 (2004)

    Article  Google Scholar 

  43. Boureau, Y., Bach, F., LeCun, Y., Ponce, J.: Learning mid-level features for recognition. In: CVPR’10 (2010)

    Google Scholar 

  44. Chatfield, K., Lempitsky, V., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. In: BMVC (2011)

    Google Scholar 

  45. Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online dictionary learning for sparse coding. In: ICML’09 (2009)

    Google Scholar 

  46. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  47. Boureau, Y., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in vision algorithms. In: ICML’10 (2010)

    Google Scholar 

  48. Li, L.: What, where and who? Classifying event by scene and object recognition. In: CVPR’07 (2007)

    Google Scholar 

  49. Bo, L., Ren, X., Fox, D.: Kernel descriptors for visual recognition. In: NIPS’10 (2010)

    Google Scholar 

  50. Elfiky, N.M., Khan, F.S., van de Weijer, J., Gonzàlez, J.: Discriminative compact pyramids for object and scene recognition. Pattern Recognit. 45(4), 1627–1636 (2012)

    Article  MATH  Google Scholar 

  51. Jia, Y., Huang, C., Darrell, T.: Beyond spatial pyramids: receptive field learning for pooled image features. In: NIPS’11 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sébastien Paris .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Paris, S., Halkias, X., Glotin, H. (2015). Beyond SIFT for Image Categorization by Bag-of-Scenes Analysis. In: Fred, A., De Marsico, M. (eds) Pattern Recognition Applications and Methods. Advances in Intelligent Systems and Computing, vol 318. Springer, Cham. https://doi.org/10.1007/978-3-319-12610-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12610-4_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12609-8

  • Online ISBN: 978-3-319-12610-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics