Efficient Architecture for Block Parallel Convolution using Two-Dimensional Polyphase Decomposition

Arumalla, Anitha; Makkena, Madhavi Latha

doi:10.1007/s00034-021-01811-9

Efficient Architecture for Block Parallel Convolution using Two-Dimensional Polyphase Decomposition

Published: 16 August 2021

Volume 41, pages 1166–1186, (2022)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

215 Accesses
1 Citation
1 Altmetric
Explore all metrics

A Correction to this article was published on 18 October 2021

This article has been updated

Abstract

Ultra-high-definition (UHD) video standards demand processing speed from 60 to 120 fps. These standards require relatively huge resources for providing such high processing speed. In this paper, an area-efficient and high-speed two-dimensional (2D) \(2\times 2\) and \(3\times 3\) block parallel scalable recursive convolution (BPSRC) architectures are proposed. The \(2\times 2\) and \(3 \times 3\) BPSRC architectures are used to implement block parallel filters with small to large kernel sizes but limited to multiples of 2 and 3, respectively. For a block parallel convolution, the spatial window is partitioned into fixed size blocks for parallel processing of the block outputs. The algorithm proved effective with respect to area and computational time. The increase in kernel size does not affect the processing time but increases the hardware cost. However, the increase in hardware cost is considerably less when compared with conventional block parallel convolution (BPC). Overall, multiplier complexity is reduced by a factor of 4/9 and 9/16 for \(3\times 3\) and \(2 \times 2\) BPSRC implementation of 2D finite impulse response (FIR) filters, respectively, over conventional BPC. A throughput of 1.55 Giga operations per second is achieved with \(2\times 2\) BPSRC, and that of 1.86 Giga operations per second is achieved with \(3\times 3\) BPSRC on Virtex 7 XC7VX485T FPGA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Self adaptable high throughput reconfigurable bilateral filter architectures for real-time image de-noising

Article 17 March 2017

A memory and area-efficient distributed arithmetic based modular VLSI architecture of 1D/2D reconfigurable 9/7 and 5/3 DWT filters for real-time image decomposition

Article 25 July 2019

An optimized parallel order scheme of the deblocking filtering process for enhancing the performance of the HEVC standard using GPUs

Article 09 June 2017

Change history

18 October 2021
A Correction to this paper has been published: https://doi.org/10.1007/s00034-021-01863-x

References

A. Arumalla, M.L. Makkena, A \(2\times 2\) block processing architecture for a two-dimensional fir filter using scalable recursive convolution, in Artificial Intelligence and Evolutionary Computations in Engineering Systems (Springer, 2018), pp. 433–440
A. Arumalla, M.L. Makkena, Scalable recursive convolution algorithm for the development of parallel fir filter architectures, in Microelectronics, Electromagnetics and Telecommunications (Springer, 2018), pp. 255–262
M. Aziz, S. Boussakta, D.C. McLernon, High performance 2D parallel block-filtering system for real-time imaging applications using the Sharc ADSP21060. Real-Time Imaging 9(2), 151–161 (2003)
Article Google Scholar
J. Batlle, J. Martı, P. Ridao, J. Amat, A new FPGA/DSP-based parallel architecture for real-time image processing. Real-Time Imaging 8(5), 345–356 (2002)
Article Google Scholar
C.-S. Bouganis, S.-B. Park, G.A. Constantinides, P.Y.K. Cheung, Synthesis and optimization of 2D filter designs for heterogeneous FPGAS. ACM Trans. Reconfig. Technol. Syst. (TRETS) 1(4), 1–28 (2009)
Article Google Scholar
C. Cheng, K.K. Parhi, Hardware efficient fast parallel fir filter structures based on iterated short convolution. IEEE Trans. Circuits Syst. I: Regul. Pap. 51(8), 1492–1500 (2004)
Article MathSciNet Google Scholar
J.-G. Chung, K.K. Parhi, Frequency spectrum based low-area low-power parallel FIR filter design. EURASIP J. Adv. Signal Process. 2002(9), 1–10 (2002)
Article Google Scholar
Z. Hu, F. Gaston, A bit-level systolic 2D-IIR digital filter without feedback, in Conference Record of The Thirtieth Asilomar Conference on Signals, Systems and Computers (IEEE, 1996), pp. 1063–1066
D. Llamocca, M. Pattichis, Dynamic energy, performance, and accuracy optimization and management using automatically generated constraints for separable 2D FIR filtering for digital video processing. ACM Trans. Reconfig. Technol. Syst. (TRETS) 7(4), 1–30 (2014)
Google Scholar
D. Llamocca, M. Pattichis, A self-reconfigurable platform for the implementation of 2d filterbanks with real and complex-valued inputs, outputs, and filter coefficients. VLSI Des. 2014, 24 (2014)
Article Google Scholar
B.G. Mertzios, Fast block implementation of two-dimensional fir digital filters by systolic arrays. Int. J. Electron. 73(6), 1233–1246 (1992)
Article Google Scholar
B.G. Mertzios, A.N. Venetsanopoulos, Fast block implementation of two-dimensional fir digital filters via the Walsh–Hadamard decomposition. Int. J. Electron. Theor. Exp. 68(6), 991–1004 (1990)
Article Google Scholar
S.K. Mitra, R. Gnanasekaran, Block implementation of two-dimensional digital filters. J. Frankl. Inst. 316(4), 299–316 (1983)
Article Google Scholar
Y. Naito, T. Miyazaki, I. Kuroda, A fast full-search motion estimation method for programmable processors with a multiply-accumulator, in 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, vol. 6 (IEEE, 1996), pp. 3221–3224
K.K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation (Wiley, Hoboken, 2007)
Google Scholar
D.A. Parker, K.K. Parhi, Low-area/power parallel fir digital filter implementations. J. VLSI Signal Process. Syst. Signal Image Video Technol. 17(1), 75–92 (1997)
Article Google Scholar
S. Perri, M. Lanuzza, P. Corsonello, G. Cocorullo, A high-performance fully reconfigurable FPGA-based 2D convolution processor. Microprocess. Microsyst. 29(8–9), 381–391 (2005)
Article Google Scholar
J.N. Sanders-Reed, D.J. Yelton, C.C. Witt, R.R. Galetti, Passive obstacle detection system (pods) for wire detection, in Enhanced and Synthetic Vision 2009, vol. 7328 (International Society for Optics and Photonics, 2009), p. 732804
J.A. Schmitz, M.K. Gharzai, S. Balkır, M.W. Hoffman, D.J. White, N. Schemm, A 1000 frames/s vision chip using scalable pixel-neighborhood-level parallel processing. IEEE J. Solid-State Circuits 52(2), 556–568 (2016)
Article Google Scholar
O.R. Seryasat, J. Haddadnia, Evaluation of a new ensemble learning framework for mass classification in mammograms. Clin. Breast Cancer 18(3), e407–e420 (2018)
Article Google Scholar
F.J. Toledo-Moreo, J.J. Martínez-Alvarez, J. Garrigos-Guerrero, J.M. Ferrández-Vicente, FPGA-based architecture for the real-time computation of 2-D convolution with large kernel size. J. Syst. Archit. 58(8), 277–285 (2012)
Article Google Scholar
C. Torres-Huitzil, M. Arias-Estrada, Real-time image processing with a compact FPGA-based systolic architecture. Real-time imaging 10(3), 177–187 (2004)
Article Google Scholar
C. Torres-Huitzil, M. Arias-Estrada, FPGA-based configurable systolic architecture for window-based image processing. EURASIP J. Adv. Signal Process. 2005(7), 1–11 (2005)
Article Google Scholar
J. Wang, J. Lin, Z. Wang, Efficient convolution architectures for convolutional neural network, in 2016 8th International Conference on Wireless Communications & Signal Processing (WCSP) (IEEE, 2016), pp. 1–5

Download references

Acknowledgements

We thank Dr. Anish Turlapaty, Assistant Professor, Indian Institute of Information Technology, Sri City, for the discussion and comments that greatly improved the manuscript.

Author information

Authors and Affiliations

ECE Department, Velagapudi Ramakrishna Siddhartha Engineering College, Vijayawada, India
Anitha Arumalla
ECE Department, Jawaharlal Nehru Technological University Hyderabad, Hyderabad, India
Madhavi Latha Makkena

Authors

Anitha Arumalla
View author publications
You can also search for this author in PubMed Google Scholar
Madhavi Latha Makkena
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anitha Arumalla.

Ethics declarations

Code availability

Authors declare that the custom code developed to verify the correctness of architecture is available with the authors on reasonable request.

Availability of data and material

Authors declare that all data and materials including custom code support the work claimed in the manuscript are available from the corresponding author on reasonable request.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this was revised to fix the errors in the mathematical symbols.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Arumalla, A., Makkena, M.L. Efficient Architecture for Block Parallel Convolution using Two-Dimensional Polyphase Decomposition. Circuits Syst Signal Process 41, 1166–1186 (2022). https://doi.org/10.1007/s00034-021-01811-9

Download citation

Received: 10 February 2021
Revised: 22 July 2021
Accepted: 23 July 2021
Published: 16 August 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s00034-021-01811-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Architecture for Block Parallel Convolution using Two-Dimensional Polyphase Decomposition

Abstract

Access this article

Similar content being viewed by others

Self adaptable high throughput reconfigurable bilateral filter architectures for real-time image de-noising

A memory and area-efficient distributed arithmetic based modular VLSI architecture of 1D/2D reconfigurable 9/7 and 5/3 DWT filters for real-time image decomposition

An optimized parallel order scheme of the deblocking filtering process for enhancing the performance of the HEVC standard using GPUs

Change history

18 October 2021

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Code availability

Availability of data and material

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient Architecture for Block Parallel Convolution using Two-Dimensional Polyphase Decomposition

Abstract

Access this article

Similar content being viewed by others

Self adaptable high throughput reconfigurable bilateral filter architectures for real-time image de-noising

A memory and area-efficient distributed arithmetic based modular VLSI architecture of 1D/2D reconfigurable 9/7 and 5/3 DWT filters for real-time image decomposition

An optimized parallel order scheme of the deblocking filtering process for enhancing the performance of the HEVC standard using GPUs

Change history

18 October 2021

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Code availability

Availability of data and material

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation