Skip to main content
Log in

Efficient Architecture for Block Parallel Convolution using Two-Dimensional Polyphase Decomposition

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

A Correction to this article was published on 18 October 2021

This article has been updated

Abstract

Ultra-high-definition (UHD) video standards demand processing speed from 60 to 120 fps. These standards require relatively huge resources for providing such high processing speed. In this paper, an area-efficient and high-speed two-dimensional (2D) \(2\times 2\) and \(3\times 3\) block parallel scalable recursive convolution (BPSRC) architectures are proposed. The \(2\times 2\) and \(3 \times 3\) BPSRC architectures are used to implement block parallel filters with small to large kernel sizes but limited to multiples of 2 and 3, respectively. For a block parallel convolution, the spatial window is partitioned into fixed size blocks for parallel processing of the block outputs. The algorithm proved effective with respect to area and computational time. The increase in kernel size does not affect the processing time but increases the hardware cost. However, the increase in hardware cost is considerably less when compared with conventional block parallel convolution (BPC). Overall, multiplier complexity is reduced by a factor of 4/9 and 9/16 for \(3\times 3\) and \(2 \times 2\) BPSRC implementation of 2D finite impulse response (FIR) filters, respectively, over conventional BPC. A throughput of 1.55 Giga operations per second is achieved with \(2\times 2\) BPSRC, and that of 1.86 Giga operations per second is achieved with \(3\times 3\) BPSRC on Virtex 7 XC7VX485T FPGA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Change history

References

  1. A. Arumalla, M.L. Makkena, A \(2\times 2\) block processing architecture for a two-dimensional fir filter using scalable recursive convolution, in Artificial Intelligence and Evolutionary Computations in Engineering Systems (Springer, 2018), pp. 433–440

  2. A. Arumalla, M.L. Makkena, Scalable recursive convolution algorithm for the development of parallel fir filter architectures, in Microelectronics, Electromagnetics and Telecommunications (Springer, 2018), pp. 255–262

  3. M. Aziz, S. Boussakta, D.C. McLernon, High performance 2D parallel block-filtering system for real-time imaging applications using the Sharc ADSP21060. Real-Time Imaging 9(2), 151–161 (2003)

    Article  Google Scholar 

  4. J. Batlle, J. Martı, P. Ridao, J. Amat, A new FPGA/DSP-based parallel architecture for real-time image processing. Real-Time Imaging 8(5), 345–356 (2002)

    Article  Google Scholar 

  5. C.-S. Bouganis, S.-B. Park, G.A. Constantinides, P.Y.K. Cheung, Synthesis and optimization of 2D filter designs for heterogeneous FPGAS. ACM Trans. Reconfig. Technol. Syst. (TRETS) 1(4), 1–28 (2009)

    Article  Google Scholar 

  6. C. Cheng, K.K. Parhi, Hardware efficient fast parallel fir filter structures based on iterated short convolution. IEEE Trans. Circuits Syst. I: Regul. Pap. 51(8), 1492–1500 (2004)

    Article  MathSciNet  Google Scholar 

  7. J.-G. Chung, K.K. Parhi, Frequency spectrum based low-area low-power parallel FIR filter design. EURASIP J. Adv. Signal Process. 2002(9), 1–10 (2002)

    Article  Google Scholar 

  8. Z. Hu, F. Gaston, A bit-level systolic 2D-IIR digital filter without feedback, in Conference Record of The Thirtieth Asilomar Conference on Signals, Systems and Computers (IEEE, 1996), pp. 1063–1066

  9. D. Llamocca, M. Pattichis, Dynamic energy, performance, and accuracy optimization and management using automatically generated constraints for separable 2D FIR filtering for digital video processing. ACM Trans. Reconfig. Technol. Syst. (TRETS) 7(4), 1–30 (2014)

    Google Scholar 

  10. D. Llamocca, M. Pattichis, A self-reconfigurable platform for the implementation of 2d filterbanks with real and complex-valued inputs, outputs, and filter coefficients. VLSI Des. 2014, 24 (2014)

    Article  Google Scholar 

  11. B.G. Mertzios, Fast block implementation of two-dimensional fir digital filters by systolic arrays. Int. J. Electron. 73(6), 1233–1246 (1992)

    Article  Google Scholar 

  12. B.G. Mertzios, A.N. Venetsanopoulos, Fast block implementation of two-dimensional fir digital filters via the Walsh–Hadamard decomposition. Int. J. Electron. Theor. Exp. 68(6), 991–1004 (1990)

    Article  Google Scholar 

  13. S.K. Mitra, R. Gnanasekaran, Block implementation of two-dimensional digital filters. J. Frankl. Inst. 316(4), 299–316 (1983)

    Article  Google Scholar 

  14. Y. Naito, T. Miyazaki, I. Kuroda, A fast full-search motion estimation method for programmable processors with a multiply-accumulator, in 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, vol. 6 (IEEE, 1996), pp. 3221–3224

  15. K.K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation (Wiley, Hoboken, 2007)

    Google Scholar 

  16. D.A. Parker, K.K. Parhi, Low-area/power parallel fir digital filter implementations. J. VLSI Signal Process. Syst. Signal Image Video Technol. 17(1), 75–92 (1997)

    Article  Google Scholar 

  17. S. Perri, M. Lanuzza, P. Corsonello, G. Cocorullo, A high-performance fully reconfigurable FPGA-based 2D convolution processor. Microprocess. Microsyst. 29(8–9), 381–391 (2005)

    Article  Google Scholar 

  18. J.N. Sanders-Reed, D.J. Yelton, C.C. Witt, R.R. Galetti, Passive obstacle detection system (pods) for wire detection, in Enhanced and Synthetic Vision 2009, vol. 7328 (International Society for Optics and Photonics, 2009), p. 732804

  19. J.A. Schmitz, M.K. Gharzai, S. Balkır, M.W. Hoffman, D.J. White, N. Schemm, A 1000 frames/s vision chip using scalable pixel-neighborhood-level parallel processing. IEEE J. Solid-State Circuits 52(2), 556–568 (2016)

    Article  Google Scholar 

  20. O.R. Seryasat, J. Haddadnia, Evaluation of a new ensemble learning framework for mass classification in mammograms. Clin. Breast Cancer 18(3), e407–e420 (2018)

    Article  Google Scholar 

  21. F.J. Toledo-Moreo, J.J. Martínez-Alvarez, J. Garrigos-Guerrero, J.M. Ferrández-Vicente, FPGA-based architecture for the real-time computation of 2-D convolution with large kernel size. J. Syst. Archit. 58(8), 277–285 (2012)

    Article  Google Scholar 

  22. C. Torres-Huitzil, M. Arias-Estrada, Real-time image processing with a compact FPGA-based systolic architecture. Real-time imaging 10(3), 177–187 (2004)

    Article  Google Scholar 

  23. C. Torres-Huitzil, M. Arias-Estrada, FPGA-based configurable systolic architecture for window-based image processing. EURASIP J. Adv. Signal Process. 2005(7), 1–11 (2005)

    Article  Google Scholar 

  24. J. Wang, J. Lin, Z. Wang, Efficient convolution architectures for convolutional neural network, in 2016 8th International Conference on Wireless Communications & Signal Processing (WCSP) (IEEE, 2016), pp. 1–5

Download references

Acknowledgements

We thank Dr. Anish Turlapaty, Assistant Professor, Indian Institute of Information Technology, Sri City, for the discussion and comments that greatly improved the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anitha Arumalla.

Ethics declarations

Code availability

Authors declare that the custom code developed to verify the correctness of architecture is available with the authors on reasonable request.

Availability of data and material

Authors declare that all data and materials including custom code support the work claimed in the manuscript are available from the corresponding author on reasonable request.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this was revised to fix the errors in the mathematical symbols.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Arumalla, A., Makkena, M.L. Efficient Architecture for Block Parallel Convolution using Two-Dimensional Polyphase Decomposition. Circuits Syst Signal Process 41, 1166–1186 (2022). https://doi.org/10.1007/s00034-021-01811-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-021-01811-9

Keywords

Navigation