Abstract
Ultra-high-definition (UHD) video standards demand processing speed from 60 to 120 fps. These standards require relatively huge resources for providing such high processing speed. In this paper, an area-efficient and high-speed two-dimensional (2D) \(2\times 2\) and \(3\times 3\) block parallel scalable recursive convolution (BPSRC) architectures are proposed. The \(2\times 2\) and \(3 \times 3\) BPSRC architectures are used to implement block parallel filters with small to large kernel sizes but limited to multiples of 2 and 3, respectively. For a block parallel convolution, the spatial window is partitioned into fixed size blocks for parallel processing of the block outputs. The algorithm proved effective with respect to area and computational time. The increase in kernel size does not affect the processing time but increases the hardware cost. However, the increase in hardware cost is considerably less when compared with conventional block parallel convolution (BPC). Overall, multiplier complexity is reduced by a factor of 4/9 and 9/16 for \(3\times 3\) and \(2 \times 2\) BPSRC implementation of 2D finite impulse response (FIR) filters, respectively, over conventional BPC. A throughput of 1.55 Giga operations per second is achieved with \(2\times 2\) BPSRC, and that of 1.86 Giga operations per second is achieved with \(3\times 3\) BPSRC on Virtex 7 XC7VX485T FPGA.
Similar content being viewed by others
Change history
18 October 2021
A Correction to this paper has been published: https://doi.org/10.1007/s00034-021-01863-x
References
A. Arumalla, M.L. Makkena, A \(2\times 2\) block processing architecture for a two-dimensional fir filter using scalable recursive convolution, in Artificial Intelligence and Evolutionary Computations in Engineering Systems (Springer, 2018), pp. 433–440
A. Arumalla, M.L. Makkena, Scalable recursive convolution algorithm for the development of parallel fir filter architectures, in Microelectronics, Electromagnetics and Telecommunications (Springer, 2018), pp. 255–262
M. Aziz, S. Boussakta, D.C. McLernon, High performance 2D parallel block-filtering system for real-time imaging applications using the Sharc ADSP21060. Real-Time Imaging 9(2), 151–161 (2003)
J. Batlle, J. Martı, P. Ridao, J. Amat, A new FPGA/DSP-based parallel architecture for real-time image processing. Real-Time Imaging 8(5), 345–356 (2002)
C.-S. Bouganis, S.-B. Park, G.A. Constantinides, P.Y.K. Cheung, Synthesis and optimization of 2D filter designs for heterogeneous FPGAS. ACM Trans. Reconfig. Technol. Syst. (TRETS) 1(4), 1–28 (2009)
C. Cheng, K.K. Parhi, Hardware efficient fast parallel fir filter structures based on iterated short convolution. IEEE Trans. Circuits Syst. I: Regul. Pap. 51(8), 1492–1500 (2004)
J.-G. Chung, K.K. Parhi, Frequency spectrum based low-area low-power parallel FIR filter design. EURASIP J. Adv. Signal Process. 2002(9), 1–10 (2002)
Z. Hu, F. Gaston, A bit-level systolic 2D-IIR digital filter without feedback, in Conference Record of The Thirtieth Asilomar Conference on Signals, Systems and Computers (IEEE, 1996), pp. 1063–1066
D. Llamocca, M. Pattichis, Dynamic energy, performance, and accuracy optimization and management using automatically generated constraints for separable 2D FIR filtering for digital video processing. ACM Trans. Reconfig. Technol. Syst. (TRETS) 7(4), 1–30 (2014)
D. Llamocca, M. Pattichis, A self-reconfigurable platform for the implementation of 2d filterbanks with real and complex-valued inputs, outputs, and filter coefficients. VLSI Des. 2014, 24 (2014)
B.G. Mertzios, Fast block implementation of two-dimensional fir digital filters by systolic arrays. Int. J. Electron. 73(6), 1233–1246 (1992)
B.G. Mertzios, A.N. Venetsanopoulos, Fast block implementation of two-dimensional fir digital filters via the Walsh–Hadamard decomposition. Int. J. Electron. Theor. Exp. 68(6), 991–1004 (1990)
S.K. Mitra, R. Gnanasekaran, Block implementation of two-dimensional digital filters. J. Frankl. Inst. 316(4), 299–316 (1983)
Y. Naito, T. Miyazaki, I. Kuroda, A fast full-search motion estimation method for programmable processors with a multiply-accumulator, in 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, vol. 6 (IEEE, 1996), pp. 3221–3224
K.K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation (Wiley, Hoboken, 2007)
D.A. Parker, K.K. Parhi, Low-area/power parallel fir digital filter implementations. J. VLSI Signal Process. Syst. Signal Image Video Technol. 17(1), 75–92 (1997)
S. Perri, M. Lanuzza, P. Corsonello, G. Cocorullo, A high-performance fully reconfigurable FPGA-based 2D convolution processor. Microprocess. Microsyst. 29(8–9), 381–391 (2005)
J.N. Sanders-Reed, D.J. Yelton, C.C. Witt, R.R. Galetti, Passive obstacle detection system (pods) for wire detection, in Enhanced and Synthetic Vision 2009, vol. 7328 (International Society for Optics and Photonics, 2009), p. 732804
J.A. Schmitz, M.K. Gharzai, S. Balkır, M.W. Hoffman, D.J. White, N. Schemm, A 1000 frames/s vision chip using scalable pixel-neighborhood-level parallel processing. IEEE J. Solid-State Circuits 52(2), 556–568 (2016)
O.R. Seryasat, J. Haddadnia, Evaluation of a new ensemble learning framework for mass classification in mammograms. Clin. Breast Cancer 18(3), e407–e420 (2018)
F.J. Toledo-Moreo, J.J. Martínez-Alvarez, J. Garrigos-Guerrero, J.M. Ferrández-Vicente, FPGA-based architecture for the real-time computation of 2-D convolution with large kernel size. J. Syst. Archit. 58(8), 277–285 (2012)
C. Torres-Huitzil, M. Arias-Estrada, Real-time image processing with a compact FPGA-based systolic architecture. Real-time imaging 10(3), 177–187 (2004)
C. Torres-Huitzil, M. Arias-Estrada, FPGA-based configurable systolic architecture for window-based image processing. EURASIP J. Adv. Signal Process. 2005(7), 1–11 (2005)
J. Wang, J. Lin, Z. Wang, Efficient convolution architectures for convolutional neural network, in 2016 8th International Conference on Wireless Communications & Signal Processing (WCSP) (IEEE, 2016), pp. 1–5
Acknowledgements
We thank Dr. Anish Turlapaty, Assistant Professor, Indian Institute of Information Technology, Sri City, for the discussion and comments that greatly improved the manuscript.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Code availability
Authors declare that the custom code developed to verify the correctness of architecture is available with the authors on reasonable request.
Availability of data and material
Authors declare that all data and materials including custom code support the work claimed in the manuscript are available from the corresponding author on reasonable request.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original online version of this was revised to fix the errors in the mathematical symbols.
Rights and permissions
About this article
Cite this article
Arumalla, A., Makkena, M.L. Efficient Architecture for Block Parallel Convolution using Two-Dimensional Polyphase Decomposition. Circuits Syst Signal Process 41, 1166–1186 (2022). https://doi.org/10.1007/s00034-021-01811-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-021-01811-9