Abstract
Variable-Length Encoding (VLE) is a process of reducing input data size by replacing fixed-length data words with codewords of shorter length. As VLE is one of the main building blocks in systems for multimedia compression, its efficient implementation is essential. The massively parallel architecture of modern general purpose graphics processing units (GPGPUs) has been successfully used for acceleration of inherently parallel compression blocks, such as image transforms and motion estimation. On the other hand, VLE is an inherently serial process due to the requirement of writing a variable number of bits for each codeword to the compressed data stream. The introduction of the atomic operations on the latest GPGPUs enables writing to the output memory locations by many threads in parallel. We present a novel data parallel algorithm for variable length encoding using atomic operations, which archives performance speedups of up to 35-50x using a CUDA-enabled GPGPU.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Huffman, D.: A method for the construction of Minimum-Redundancy codes. Proceedings of the IRE 40(9), 1098–1101 (1952)
Allusse, Y., Horain, P., Agarwal, A., Saipriyadarshan, C.: GpuCV: an opensource GPU-accelerated framework forimage processing and computer vision. In: 2006 IEEE International Conference on Multimedia and Expo. (2008)
Chen, W., Hang, H.: H. 264/AVC motion estimation implmentation on Compute Unified Device Architecture (CUDA). In: 2008 IEEE International Conference on Multimedia and Expo., pp. 697–700 (2008)
Fung, J., Mann, S.: Using graphics devices in reverse: GPU-based image processing and computer vision. In: 2008 IEEE International Conference on Multimedia and Expo., pp. 9–12 (2008)
Blelloch, G.E.: Prefix sums and their applications. Synthesis of Parallel Algorithms, 35–60 (1990)
Roger, D., Assarsson, U., Holzschuch, N.: Efficient stream reduction on the gpu. In: Kaeli, D., Leeser, M. (eds.) Workshop on General Purpose Processing on Graphics Processing Units (October 2007)
Castaño, I.: High quality dxt compression using cuda. Technical report, NVIDIA (last access, May 2008)
Lindholm, E., Nickolls, J., Oberman, S., Montrym, J.: NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro 28(2), 39–55 (2008)
NVIDIA Corporation Technical Staff: Nvidia cuda -programming guide 2.0. Technical report, NVIDIA (last access, May 2009)
Atallah, M., Kosaraju, S., Larmore, L., Miller, G., Teng, S.: Constructing trees in parallel. In: Proceedings of the first annual ACM symposium on Parallel algorithms and architectures, pp. 421–431. ACM, New York (1989)
Harris, M., Sengupta, S., Owens, J.D.: Parallel prefix sum (scan) with cuda. GPU Gems 3 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Balevic, A. (2010). Parallel Variable-Length Encoding on GPGPUs. In: Lin, HX., et al. Euro-Par 2009 – Parallel Processing Workshops. Euro-Par 2009. Lecture Notes in Computer Science, vol 6043. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14122-5_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-14122-5_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14121-8
Online ISBN: 978-3-642-14122-5
eBook Packages: Computer ScienceComputer Science (R0)