Skip to main content
Log in

Speeding up spatiotemporal feature extraction using GPU

  • Original Research Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

Spatiotemporal feature extraction algorithms are widely used in many image processing and computer vision applications. They are favored because of their robust generated features. However, they have high computational complexity. Parallelizing these algorithms, in order to speed their execution up, is of great importance. In this paper, we propose new parallel implementations, using GPU computing, for the two most widely used spatiotemporal feature extraction algorithms: scale-invariant feature transform and speeded up robust features. In our implementations, we solve problems with previous parallel implementations, such as load imbalance, thread synchronization, and the use of atomic operations. Our implementations speed up the execution by simultaneously processing all the work of each stage of the two algorithms, without dividing that stage into smaller sequential ones. The allocation of the threads in our implementations further allows them to increase the occupancy of the GPU streaming multiprocessors (SMs). We compare our presented implementations to previous CPU and GPU parallel implementations of the two algorithms. Results show that the proposed implementations could do all the processing in real time with high accuracy. They further achieve higher speedup, frame rate, and SM occupancy than the previous best-known parallel implementations of the two algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27

Similar content being viewed by others

References

  1. Lowe, D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  2. Laptev, I., Lindeberg, T.: Local descriptors for spatio-temporal recognition. Lect. Notes Comput. Sci. 3667, 91–103 (2006)

    Article  Google Scholar 

  3. Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110(3), 346–359 (2008)

    Article  Google Scholar 

  4. Lee, C., Rhee, C.E., Lee, H.-J.: Complexity reduction by modified scale-space construction in sift generation optimized for a mobile GPU. IEEE Trans. Circuits Syst. Video Technol. 27(10), 2246–2259 (2017)

    Article  Google Scholar 

  5. Zhang, Q., Chen, Y., Zhang, Y., Xu, Y.: SIFT implementation and optimization for multi-core systems. In: 2008. IPDPS 2008. IEEE International Symposium on Parallel and Distributed Processing, pp. 1–8. IEEE (2008)

  6. Moren, K., Göhringer, D.: A framework for accelerating local feature extraction with OpenCL on multi-core CPUs and co-processors. J. Real-Time Image Process. 10(1007), 1–18 (2016)

    Google Scholar 

  7. Zhu, F., Chen, P., Yang, D., Zhang, W., Chen, H., Zang, B.: A GPU-based high-throughput image retrieval algorithm. In: Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units. ACM30-37, (2012)

  8. Yan, W., Shi, X., Yan, X., Wang, L.: Computing OpenSURF on OpenCL and general purpose GPU. Int. J. Adv. Robot. Syst. 10(10), 375 (2013)

    Article  Google Scholar 

  9. Lu, Y., Li, Y., Song, B., Zhang, W., Chen, H., Peng, L.: Parallelizing image feature extraction algorithms on multi-core platforms. J. Parallel Distrib. Comput. 92, 1–14 (2016)

    Article  Google Scholar 

  10. Luebke, D.: CUDA: scalable parallel programming for high-performance scientific computing. In: The 2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro (ISBI 2008). IEEE836-838, (2008)

  11. Hwu, W.-M.W.: GPU Computing Gems Emerald Edition. Elsevier, Amsterdam (2011)

    Google Scholar 

  12. Brown, M., Lowe, D. G.: Invariant features from interest point groups. In: Proceedings of the British Machine Vision Conference 2002, BMVC, pp. 253–262. (2002)

  13. Antonini, M., Barlaud, M., Mathieu, P., Daubechies, I.: Image coding using wavelet transform. IEEE Trans. Image Process. 1(2), 205–220 (1992)

    Article  Google Scholar 

  14. Heymann, S., Muller, K., Smolic, A., Frohlich, B., Wiegand, F.: SIFT implementation and optimization for general-purpose GPU. In: Proceedings of the International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision, (2007)

  15. Sinha, S. N., Frahm, J.-M., Pollefeys, M., Genc, Y.: GPU-based video feature tracking and matching. In: EDGE, Workshop on Edge Computing Using New Commodity Architectures, vol. 278, p. 4321. (2006)

  16. Sinha, S., Frahm, J.-M., Pollefeys, M., Genc, Y.: Feature tracking and matching in video using programmable graphics hardware. Mach. Vis. Appl. 22(1), 207–217 (2007)

    Article  Google Scholar 

  17. Wu, C.: SiftGPU: a GPU implementation of scale invariant feature transform, https://github.com/pitzer/SiftGPU (2012)

  18. Vedaldi, A.: An open implementation of the SIFT detector and descriptor. UCLA CSD, http://vision.ucla.edu/~vedaldi/code/sift.html (2007)

  19. Yonglong, Z., Kuizhi, M., Xiang, J., Peixiang, D.: Parallelization and optimization of sift on GPU using CUDA. In: 2013 IEEE 10th International Conference on High Performance Computing and Communications, The 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC_EUC), IEEE1351-1358, (2013)

  20. Mohammadi, M., Rezaeian, M.: Towards affordable computing: SiftCU a simple but elegant GPU-based implementation of SIFT. Int. J. Comput. Appl. 90(7), 30–37 (2014)

    Google Scholar 

  21. Acharya, K., Babu, R. V., Vadhiyar, S. S: A real-time implementation of SIFT using GPU. J. Real-Time Image Process. 1–11 (2014). https://doi.org/10.1007/s11554-014-0446-6

    Article  Google Scholar 

  22. Harris, M., Sengupta, S., Owens, J.D.: Parallel prefix sum (scan) with CUDA. GPU Gems 3(39), 851–876 (2007)

    Google Scholar 

  23. Terriberry, T., French, L., Helmsen, J.: GPU accelerating speeded-up robust features. In: Proceedings of 3DPVT. p. 355–362. (2008)

  24. Blelloch, G.: Prefix sums and their applications. In: J.H. Reif (ed). Synthesis of Parallel Algorithms, Morgan Kaufmann Publishers Inc. San Francisco, CA, USA (1993)

  25. Bilgic, B., Horn, B. K., Masaki, I.: Efficient integral image computation on the GPU. In: Intelligent Vehicles Symposium (IV), 2010 IEEE, IEEE528-533, (2010)

  26. Fang, Z., Yang, D., Zhang, W., Chen, H., Zang, B.: A comprehensive analysis and parallelization of an image retrieval algorithm. In: 2011 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), IEEE154-164, (2011)

  27. Schulz, A., Jung, F., Hartte, S.: CUDA SURF: a real-time implementation for SURF. https://www.d2.mpi-inf.mpg.de/surf (2011)

  28. Cheon, S., Eom, I.K., Ha, S.W., Moon, Y.H.: An enhanced SURF algorithm based on new interest point detection procedure and fast computation technique. J. Real-Time Image Process (2016). https://doi.org/10.1007/s11554-016-0614-y

    Article  Google Scholar 

  29. Hong, S., Kim, H.: An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In: ACM SIGARCH Computer Architecture News, ACM.37, 3, pp. 152–163. (2009)

    Article  Google Scholar 

  30. Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative Approach. Elsevier, Amsterdam (2011)

    MATH  Google Scholar 

  31. Nvidia: NVIDIA Tesla P100: the most advanced datacenter accelerator ever built, featuring pascal GP100, the world’s fastest GPU, In: whitepaper. https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf

  32. C. Nvidia: C Programming Guide v9. 1. Nvidia Corporation, Santa Clara (2017)

    Google Scholar 

  33. Barandiaran, I., Cortes, C., Nieto, M., Grana, M., Ruiz, O. E.: A new evaluation framework and image dataset for keypoint extraction and feature descriptor matching. In: Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP). vol 1, pp. 252–257. (2013)

  34. Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27(10), 1615–1630 (2005)

    Article  Google Scholar 

  35. Van Rijsbergen, C.: Information Retrieval. vol 14, Department of Computer Science, University of glasgow. citeseer.ist.psu.edu/vanrijsbergen79information.html (1979)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmed Mehrez.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mehrez, A., Morgan, A.A. & Hemayed, E.E. Speeding up spatiotemporal feature extraction using GPU. J Real-Time Image Proc 16, 2379–2407 (2019). https://doi.org/10.1007/s11554-018-0755-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-018-0755-2

Keywords

Navigation