A Benchmark Framework for Data Compression Techniques

Damme, Patrick; Habich, Dirk; Lehner, Wolfgang

doi:10.1007/978-3-319-31409-9_6

Patrick Damme¹⁵,
Dirk Habich¹⁵ &
Wolfgang Lehner¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9508))

Included in the following conference series:

Technology Conference on Performance Evaluation and Benchmarking

1057 Accesses
3 Citations

Abstract

Lightweight data compression is frequently applied in main memory database systems to improve query performance. The data processed by such systems is highly diverse. Moreover, there is a high number of existing lightweight compression techniques. Therefore, choosing the optimal technique for a given dataset is non-trivial. Existing approaches are based on simple rules, which do not suffice for such a complex decision. In contrast, our vision is a cost-based approach. However, this requires a detailed cost model, which can only be obtained from a systematic benchmarking of many compression algorithms on many different datasets. A naïve benchmark evaluates every algorithm under consideration separately. This yields many redundant steps and is thus inefficient. We propose an efficient and extensible benchmark framework for compression techniques. Given an ensemble of algorithms, it minimizes the overall run time of the evaluation. We experimentally show that our approach outperforms the naïve approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The source code of our framework can be downloaded at https://wwwdb.inf.tu-dresden.de/team/staff/patrick-damme-msc/.
2.
Note that, alternatively, a transformation from some already reachable node to X could be added. This could be especially useful, since transformations are faster than compressions in many cases. However, finding the fastest way to make X reachable would require a cost model for the algorithms, which can only be available after the systematic benchmarking.
3.
The compressions can be executed in an arbitrary order. The same applies to the decompressions. However, the transformations cannot be applied in an arbitrary order in general, since a transformation could require a source format that is not present after all compressions in \(\mathcal {A}^+\) have been executed, as it is the case for 4G-2-4Ns in our example.

References

Abadi, D., Madden, S., Ferreira, M.: Integrating compression and execution in column-oriented database systems. In: SIGMOD, pp. 671–682 (2006)
Google Scholar
Damme, P., Habich, D., Lehner, W.: Direct transformation techniques for compressed data: general approach and application scenarios. In: Morzy, T., Valduriez, P., Ladjel, B. (eds.) ADBIS 2015. LNCS, vol. 9282, pp. 151–165. Springer, Heidelberg (2015)
Chapter Google Scholar
Färber, F., May, N., Lehner, W., Große, P., Müller, I., Rauhe, H., Dees, J.: The SAP HANA database - an architecture overview. IEEE Data Eng. Bull. 35(1), 28–33 (2012)
Google Scholar
Goldstein, J., Ramakrishnan, R., Shaft, U.: Compressing relations and indexes. In: ICDE, pp. 370–379 (1998)
Google Scholar
Große, P., Lehner, W., May, N.: Advanced analytics with the SAP HANA database. In: DATA, pp. 61–71 (2013)
Google Scholar
Huffman, D.: A method for the construction of minimum-redundancy codes. Proc. IRE 40(9), 1098–1101 (1952)
Article MATH Google Scholar
Lee, J., Kwon, Y.S., Färber, F., Muehle, M., Lee, C., Bensberg, C., Lee, J., Lee, A.H., Lehner, W.: SAP HANA distributed in-memory database system: transaction, session, and metadata management. In: ICDE, pp. 1165–1173 (2013)
Google Scholar
Lemire, D., Boytsov, L., Kaser, O., Caron, M., Dionne, L., Lemay, M., Kruus, E., Bedini, A., Petri, M.: The FastPFOR c++ library: Fast integer compression. https://github.com/lemire/FastPFOR
Lemire, D., Boytsov, L.: Decoding billions of integers per second through vectorization (2012). CoRR abs/1209.2137
Google Scholar
Lempel, A., Ziv, J.: On the complexity of finite sequences. IEEE Trans. Inf. Theory 22, 75–81 (1976)
Article MathSciNet MATH Google Scholar
Paradies, M., Lemke, C., Plattner, H., Lehner, W., Sattler, K.U., Zeier, A., Krueger, J.: How to juggle columns: an entropy-based approach for table compression. In: IDEAS, pp. 205–215 (2010)
Google Scholar
Plattner, H.: A common database approach for OLTP and OLAP using an in-memory column database. In: SIGMOD, pp. 1–2 (2009)
Google Scholar
Roth, M.A., Van Horn, S.J.: Database compression. SIGMOD Rec. 22(3), 31–39 (1993)
Article Google Scholar
Schlegel, B., Gemulla, R., Lehner, W.: Fast integer compression using SIMD instructions. In: DaMoN Workshop, pp. 34–40 (2010)
Google Scholar
Stepanov, A.A., Gangolli, A.R., Rose, D.E., Ernst, R.J., Oberoi, P.S.: SIMD-based decoding of posting lists. In: CIKM, pp. 317–326 (2011)
Google Scholar
Willhalm, T., Popovici, N., Boshmaf, Y., Plattner, H., Zeier, A., Schaffner, J.: SIMD-scan: ultra fast in-memory table scan using on-chip vector processing units. Proc. VLDB Endow. 2(1), 385–394 (2009)
Article Google Scholar
Witten, I.H., Neal, R.M., Cleary, J.G.: Arithmetic coding for data compression. Commun. ACM 30(6), 520–540 (1987)
Article Google Scholar
Zukowski, M., Heman, S., Nes, N., Boncz, P.: Super-scalar RAM-CPU cache compression. In: ICDE, pp. 59–70 (2006)
Google Scholar

Download references

Acknowledgments

This work was funded by the German Research Foundation (DFG) in the context of the project “Lightweight Compression Techniques for the Optimization of Complex Database Queries” (LE-1416/26-1).

Author information

Authors and Affiliations

Database Systems Group, Technische Universität Dresden, 01062, Dresden, Germany
Patrick Damme, Dirk Habich & Wolfgang Lehner

Authors

Patrick Damme
View author publications
You can also search for this author in PubMed Google Scholar
Dirk Habich
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Lehner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Patrick Damme .

Editor information

Editors and Affiliations

Cisco Systems, Inc., San Jose, CA, USA
Raghunath Nambiar
Oracle Corporation, Redwood City, CA, USA
Meikel Poess

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Damme, P., Habich, D., Lehner, W. (2016). A Benchmark Framework for Data Compression Techniques. In: Nambiar, R., Poess, M. (eds) Performance Evaluation and Benchmarking: Traditional to Big Data to Internet of Things. TPCTC 2015. Lecture Notes in Computer Science(), vol 9508. Springer, Cham. https://doi.org/10.1007/978-3-319-31409-9_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-31409-9_6
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31408-2
Online ISBN: 978-3-319-31409-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics