Skip to main content

Algorithmic Optimizations in the HMAX Model Targeted for Efficient Object Recognition

  • Conference paper
Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2015)

Abstract

In this paper, we propose various approximations aimed at increasing the accuracy of the S1, C1 and S2 layers of the original Gray HMAX model of the visual cortex. At layer S1, an image is convolved with 64 separable gabor filters in the spatial domain after removing some irrelevant information such as illumination and expression variations. At layer C1, some of the minimum scales values are exploited in addition to the maximum ones in order to increase the model’s accuracy. By applying the embedding space in the additive domain, the advantage of some of the minimum scales values is taken by embedding them into their corresponding maximum ones based on a weight value between 0 and 1. At layer S2, we apply clustering, which is considered one the most interesting research areas in the field of data mining, in order to enhance the manner by which all the prototypes are selected during the feature learning stage. This is achieved by using the Partitioning Around Medoid (PAM) clustering algorithm. The impact of these approximations in terms of accuracy and computational complexity was evaluated on the Caltech101 dataset containing a total of 9,145 images split between 101 distinct object categories in addition to a background category, and compared with the baseline performance using support vector machine (SVM) and nearest neighbor (NN) classifiers. The results show that our model provides significant improvement in accuracy at the S1 layer by more than 10 % where the computational complexity is also reduced. The accuracy is slightly increased for both approximations at the C1 and S2 layers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Serre, T., Wolf, L., Poggio, T.: Object recognition with features inspired by visual cortex. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2005), pp. 994–1000 (2005b)

    Google Scholar 

  2. Serre, T., Kouh, M., Cadieu, C., Knoblich, U., Kreiman, G., Poggio, T.: A theory of object recognition: computations and circuits in the feedforward path of the ventral stream in primate visual cortex. CBCL Paper #259/AI Memo #2005-036, Massachusetts Institute of Technology, Cambridge, MA (2005a)

    Google Scholar 

  3. Amayeh, G., Tavakkoli, A., Bebis, G.: Accurate and efficient computation of gabor features in real-time applications. In: Bebis, G., et al. (eds.) ISVC 2009, Part I. LNCS, vol. 5875, pp. 243–252. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  4. Cadieu, C., Kouh, M., Riesenhuber, M., Poggio, T.: Shape representation in v4: Investigating position-specific tuning for boundary conformation with the standard model of object recognition. J. Vis. 5(8), 671 (2005)

    Article  Google Scholar 

  5. Bermudez-Contreras, E., Buxton, H., Spier, E.: Attention can improve a simple model for object recognition. Image Vis. Comput. 26, 776–787 (2008)

    Article  Google Scholar 

  6. Serre, T., Riesenhuber, M.: Realistic modeling of simple and complex cell tuning in the hmax model, and implications for invariant object recognition in cortex. Massachusetts Institute of Technology, Cambridge, MA. CBCL, Paper 239/Al Memo 2004–017 (2004)

    Google Scholar 

  7. Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M., Poggio, T.: Robust object recognition with cortexlike mechanisms. In: IEEE Conference on Pattern Analysis and Machine Intelligence, vol. 29, pp. 411–426 (2007b)

    Google Scholar 

  8. Mutch, J., Lowe, D.G.: Multiclass object recognition with sparse, localized features. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 11–18 (2006)

    Google Scholar 

  9. Chikkerur, S., Poggio, T.: Approximations in the hmax model. MIT-CSAIL-TR-2011-021, CBCL-298, p. 12 (2011)

    Google Scholar 

  10. Holub, A., Welling, M.: Exploiting unlabelled data for hybrid object classification. In: Advances in Neural Information Processing Systems (NIPS 2005) Workshop in Inter-Class Transfer (2005)

    Google Scholar 

  11. Grauman, K., Darrell, T.: The pyramid match kernel: Discriminative classification with sets of image features. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), vol. 2, pp. 1458–1465 (2005)

    Google Scholar 

  12. Serre, T., Kreiman, G., Kouh, M., Cadieu, C., Knoblich, U., Poggio, T.: A quantitative theory of immediate visual recognition. Prog. Brain Res. Comput. Neurosci. Theor. Insights Brain Funct. 165, 33–56 (2007a)

    Article  Google Scholar 

  13. Sharif, M., Anis, S., Raza, M., Mohsin, S.: Enhanced SVD based face recognition. J. Appl. Comput. Sci. Math. 12, 49 (2012)

    Google Scholar 

  14. Kumar, P., Wasan, S.K.: Comparative study of k-means, pam and rough k-means algorithms using cancer datasets. In: Proceedings of CSIT: 2009 International Symposium on Computing, Communication, and Control (ISCCC) Singapore, 2011, pp. 136–140 (2011)

    Google Scholar 

  15. Crochiere, R., Webber, S., Flanagan, J.: Digital coding of speech in sub-bands. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 233–236 (1976)

    Google Scholar 

  16. Burt, P., Adelson, E.: The Laplacian pyramid as a compact image code. IEEE Trans. Commun. 31(4), 532–540 (1983)

    Article  Google Scholar 

  17. Vetterli, M., Le Gall, D.: Perfect reconstruction FIR filter banks: Some properties and factorizations. IEEE Trans. Acoust. Speech Sig. Process. 37(7), 1057–1071 (1989)

    Article  Google Scholar 

  18. Hubel, D.H., Freeman, W.H.: The Human Eye: Structure and Function. Sinauer Associates, Sunderland (1999)

    Google Scholar 

  19. Oyster, C.W.: Eye, Brain and Vision. vol. 12(1), pp. 40–41 (1989)

    Google Scholar 

  20. Purves, D.: Brains: How They Seem To Work. FT Press, Upper Saddle River (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ali Chehab .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Bitar, A.W., Mansour, M.M., Chehab, A. (2016). Algorithmic Optimizations in the HMAX Model Targeted for Efficient Object Recognition. In: Braz, J., et al. Computer Vision, Imaging and Computer Graphics Theory and Applications. VISIGRAPP 2015. Communications in Computer and Information Science, vol 598. Springer, Cham. https://doi.org/10.1007/978-3-319-29971-6_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-29971-6_20

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-29970-9

  • Online ISBN: 978-3-319-29971-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics