Abstract
In this paper, we address the general problem of image/object categorization with a novel approach referred to as Bag-of-Scenes (BoS). Our approach is efficient for both low semantic applications, such as texture classification and higher semantic tasks such as natural scenes recognition. It is based on the widely used combination of (i) Sparse coding (Sc), (ii) Max-pooling and (iii) Spatial Pyramid Matching (SPM) techniques applied to histograms of multi-scale Local Binary/Ternary Patterns (LBP/LTP) as local features. This approach can be considered as a two-layer hierarchical architecture. The first layer encodes quickly the local spatial patch structure via histograms of LBP/LTP, while the second layer encodes the relationships between pre-analyzed LBP/LTP-scenes/objects. In order to provide comparative results, we also introduce an alternate 2-layer architecture. For this latter, the first layer is encoding directly the multi-scale Differential Vectors (DV) local patches instead of histograms of LBP/LTP. Our method outperforms SIFT-based approaches using Sc techniques and can be trained efficiently with a simple linear SVM. Our BoS method achieves \(87.46\,\%\), and \(90.35\,\%\) of accuracy for Scene-15, UIUC-Sport datasets respectively.
Granded by COGNILEGO ANR 2010-CORD-013 and PEPS RUPTURE Scale Swarm Vision.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
\(1\!\!1_{\{x\}}=1\) if event \(x\) is true, \(0\) otherwise.
- 2.
LSc requiers to store sparse codes of the template set, i.e, a sparse matrix \((K\times N_{\textit{template}})\).
References
Bosch, A., Zisserman, A., Munoz, X.: Image classification using random forests and ferns. In: ICCV’07 (2007)
Larios, N., Lin, J., Zhang, M., Lytle, D., Moldenke, A., Shapiro, L., Dietterich, T.: Stacked spatial-pyramid kernel: an object-class recognition method to combine scores from random trees. In: WACV’11 (2011)
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 145–175 (2001)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR’05 (2005)
Deselaers, T., Ferrari, V.: Global and efficient self-similarity for object classification and detection. In: CVPR’10 (2010)
Chen, J., Shan, S., He, C., Zhao, G., Pietikainen, M., Chen, X., Gao, W.: Wld: a robust local image descriptor. IEEE Trans. PAMI 32(9), 1705–1720 (2010)
Fröba, B., Ernst, A.: Face detection with the modified census transform. In: FGR’04 (2004)
Wu, J., Geyer, C., Rehg, J.M.: Real-time human detection using contour cues. In: ICRA’11 (2011)
Marcel, S., Rodriguez, Y., Heusch, G.: On the recent use of local binary patterns for face authentication. Int. J. Image Video Process. Spec. Issue Facial Image Process. 1–9 (2007)
Zhang, L., Chu, R., Xiang, S., Liao, S., Li, S.Z.: Face detection based on multi-block lbp representation. In: ICB’07 (2007)
Sadat, R.M.N., Teng, S.W., Lu, G., Hasan, S.F.: Texture classification using multimodal invariant local binary pattern. In: WACV’11 (2011)
Bianconi, F., González, E., Fernández, A., Saetta, S.A.: Automatic classification of granite tiles through colour and texture features. Expert Syst. Appl. 39(12), 11212–11218 (2012)
Wu, J., Rehg, J.M.: Where am i: place instance and category recognition using spatialpact. In: CVPR’2008 (2008)
Gao, S., Tsang, I.W.-H., Chia, L.-T., Zhao, P.: Local features are not lonely Laplacian sparse coding for image classification. In: CVPR’10 (2010)
Paris, S., Glotin, H.: Pyramidal multi-level features for the robot vision@icpr 2010 challenge. In: ICPR’10 (2010)
Zhang, B., Gao, Y., Zhao, S., Liu, J.: Local derivative pattern versus local binary pattern: face recognition with high-order local pattern descriptor. IEEE Trans. Image Proc. 19(2), 533–544 (2010)
Ojala, T., Pietikäinen, M., Mäenpää, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. PAMI 24(7), 971–987 (2002)
Zheng, Y., Shen, C., Hartley, R.I., Huang, X.: Effective pedestrian detection using center-symmetric local binary/trinary patterns. In: CoRR, vol. abs/1009.0892 (2010)
Zhang, W., Shan, S., Qing, L., Chen, X., Gao, W.: Are gabor phases really useless for face recognition? Pattern Anal. Appl. 12(3), 301–307 (2009)
Lee, H., Chung, Y., Kim, J., Park, D.: Face image retrieval using sparse representation classifier with gabor-lbp histogram. In: WISA’10 (2010)
Jun, B., Kim, D.: Robust face detection using local gradient patterns and evidence accumulation. Pattern Recognit. 45, 3304–3316 (2012)
Hussain, S.U., Triggs, W.: Visual recognition using local quantized patterns. In: CVPR’12 (2012)
Heikkilä, M., Pietikäinen, M., Schmid, C.: Description of interest regions with center-symmetric local binary patterns. In: CVGIP’06 (2006)
Huang, D., Shan, C., Ardabilian, M., Wang, Y., Chen, L.: Local binary patterns and its application to facial image analysis: a survey. IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 41, 1–17 (2011)
Bianconi, F., Fernández, A.: On the occurrence probability of local binary patterns: a theoretical study. J. Math. Imaging Vis. 40(3), 259–268 (2011)
Tan, X., Triggs, B.: Enhanced local texture feature sets for face recognition under difficult lighting conditions. Trans. Image Proc. 19, 1635–1650 (2010)
Willamowski, J., Arregui, D., Csurka, G., Dance, C.R., Fan, L.: Categorizing nine visual classes using local appearance descriptors. In: ICPR’04 (2004)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR’06 (2006)
Lowe, D.G.: Object recognition from local scale-invariant features. In: ICCV’99 (2009)
Yang, J., Yu, K., Gong, Y., Huang, T.S.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR’09 (2009)
Sermanet, P., Chintala, S., LeCun, Y.: Convolutional neural networks applied to house numbers digit classification. In: ICPR’12 (2012)
Wu, J., Rehg, J.: Beyond the euclidean distance: creating effective visual codebooks using the histogram intersection kernel. In: ICCV’09 (2009)
Boiman, O., Shechtman, E., Irani, M.: In defense of nearest-neighbor based image classification. In: CVPR’08 (2008)
Avila, S.E.F., Thome, N., Cord, M., Valle, E., de Albuquerque Araújo, A.: Bossa: extended bow formalism for image classification. In: ICIP’11 (2011)
Oliveira, G.L., Nascimento, E.R., Viera, A.W., Campos, M.F.M.: Sparse spatial coding: a novel approach for efficient and accurate object recognition. In: ICRA’12 (2012)
Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: ECCV’10 (2010)
Krapac, J., Verbeek, J., Jurie, F.: Modeling spatial layout with fisher vectors for image categorization. In: ICCV’11 (2011)
Bo, L., Ren, X., Fox, D.: Hierarchical matching pursuit for image classification: architecture and fast algorithms. In: NIPS’11, pp. 2115–2123 (2011)
Shalev-Shwartz, S., Singer, Y., Srebro, N., Cotter, A.: Pegasos: primal estimated sub-gradient solver for svm (2007)
Hsieh, C., Chang, K., Lin, C., Keerthi, S.: A dual coordinate descent method for large-scale linear svm (2008)
Liao, S., Zhu, X., Lei, Z., Zhang, L., Li, S.Z.: Learning multi-scale block local binary patterns for face recognition. In: ICB (2007)
Viola, P., Jones, M.: Robust real-time face detection. Int. J. Comput. Vis. 57, 137–154 (2004)
Boureau, Y., Bach, F., LeCun, Y., Ponce, J.: Learning mid-level features for recognition. In: CVPR’10 (2010)
Chatfield, K., Lempitsky, V., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. In: BMVC (2011)
Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online dictionary learning for sparse coding. In: ICML’09 (2009)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996)
Boureau, Y., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in vision algorithms. In: ICML’10 (2010)
Li, L.: What, where and who? Classifying event by scene and object recognition. In: CVPR’07 (2007)
Bo, L., Ren, X., Fox, D.: Kernel descriptors for visual recognition. In: NIPS’10 (2010)
Elfiky, N.M., Khan, F.S., van de Weijer, J., Gonzàlez, J.: Discriminative compact pyramids for object and scene recognition. Pattern Recognit. 45(4), 1627–1636 (2012)
Jia, Y., Huang, C., Darrell, T.: Beyond spatial pyramids: receptive field learning for pooled image features. In: NIPS’11 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Paris, S., Halkias, X., Glotin, H. (2015). Beyond SIFT for Image Categorization by Bag-of-Scenes Analysis. In: Fred, A., De Marsico, M. (eds) Pattern Recognition Applications and Methods. Advances in Intelligent Systems and Computing, vol 318. Springer, Cham. https://doi.org/10.1007/978-3-319-12610-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-12610-4_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12609-8
Online ISBN: 978-3-319-12610-4
eBook Packages: EngineeringEngineering (R0)