VarDB: High-Performance Warehouse Processing with Massive Ordering and Binary Search

Martins, Pedro; Costa, João; Cecílio, José; Furtado, Pedro

doi:10.1007/978-3-642-23544-3_14

Pedro Martins¹⁸,
João Costa¹⁸,
José Cecílio¹⁸ &
…
Pedro Furtado¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6862))

Included in the following conference series:

International Conference on Data Warehousing and Knowledge Discovery

1261 Accesses

Abstract

Current data base management systems (DBMS) compete aggressively for performance. In order to accomplish that, they are adopting new storage schemas, developing better compression algorithms, using faster hardware, optimizing parallel and distributed data processing. Current row-wise systems do not exploit massive ordering redundancy, and current column-wise approaches exploit only partially. An important current research issue concerns replacing optimization and processing complexity by less complex but ultra fast solutions. We propose the varDB approach to optimize performance over data warehouses. The solution minimizes complex operators, by applying a simple scheme and organizing all structures and processing to that end: massive ordering with efficient sorting and log2N searching. Considering data warehouses, with periodic loads and frequent analysis operations, such an approach provides very fast query processing. In our work we show how it is possible to use this massive data ordering/sorting in order to optimize queries for high speed, even without the use of data compression (therefore also avoiding compression/decompression overheads). We dedicate our attention to sort columns of data and correlating them with other replicated and unsorted columns. For querying, we focus on binary-search and the use of mainly offsets. Our tests of loading data, sorting vs. creating indexes and executing very selective operations like data filtering and joining show, using a simple disk based prototype, that we are able to obtain much better performance comparing with optimized row-wise engines, and also improvements when comparing with column-wise optimized engines. Comparing to those we were able to attain at least similar performance for many queries and much better performance for queries with complex joins.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Richard Burns, Senior Consultant. Exadata – the Sequel, Exadata V2 is Still Oracle. Teradata Corporation
Google Scholar
Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E.J., O’Neil, P.E., Rasin, A., Tran, N., Zdonik, S.B.: C-Store: A column-oriented DBMS. In: VLDB, pp. 553–564 (2005)
Google Scholar
Stonebraker, M., Hellerstein, J.: What Goes Around Comes Around. In: Readings in Database Systems, 4th edn., pp. 2–41. The MIT Press, Cambridge (2005)
Google Scholar
Halverson, A., Beckmann, J.L., Naughton, J.F., Dewitt, D.J.: A Comparison of C-Store and Row-Store in a Common Framework. Technical Report TR1570. University of Wisconsin-Madison (2006)
Google Scholar
Pavlo, A., Rasin, A., Madden, S., Stonebraker, M., DeWitt, D., Paulson, E., Shrinivas, L., Abadi, D.J.: A Comparison of Approaches to Large Scale Data Analysis. In: SIGMOD 2009, June 29-July 2 (2009)
Google Scholar
Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., Rasin, A.: HadoopDB: An Architectural Hybrid of Map Reduce and DBMS Technologies for Analytical Workloads. In: VLDB 2009, Lyon, France, August 24-28 (2009)
Google Scholar
VoltDB Technical Overview White Paper
Google Scholar
Cole, B.:Hybrid embedded database merges on-disk and in-memory data management. Embedded.com (February 2007)
Google Scholar
Stonebraker, M., Abadi, D.J., Batkin, A., et al.: C-Store: A Column-oriented DBMS. In: VLDB (2005)
Google Scholar
Ramakrisnan, R.: Database Management Systems, 3rd edn. University of Wisconsin Madison, Wsiconsin
Google Scholar
Furtado, P.: A Survey of Parallel and Distributed Data Warehouses. International Journal of Data Warehousing & Mining, 57–77 (April-June 2009) ; University de Coimbra
Google Scholar
Boncz, P.A., Zukowski, M., Nes, N.: MonetDB/X100: Hyper-Pipelining Query Execution. In: CIDR (2005)
Google Scholar
Olofson, C.: Worldwide RDBMS 2005 vendor shares. Technical Report 201692, IDC (May 2006)
Google Scholar
Vesset, D.: Worldwide data warehousing tools 2005 vendor shares. Technical Report 203229, IDC (August 2006)
Google Scholar
Boncz, P.A., Manegold, S., Kersten, M.L.: Database Architecture Optimized for the New Bottleneck: Memory Access. In: VLDB (1999)
Google Scholar
Copeland, G.P., Khoshafian, S.: A Decomposition Storage Model. In: SIGMOD (1985)
Google Scholar
Grund, M., Krueger, J., Plattner, H.: HYRISE—A Main Memory Hybrid Storage Engine. In: VLDB 2010, Singapore, September 13-17 (2010)
Google Scholar
Titman, P.J.: An Experimental DataBase System Using Binary: Relations. In: IFIP Working Conference Data Base Management (1974)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Coimbra, Coimbra, Portugal
Pedro Martins, João Costa, José Cecílio & Pedro Furtado

Authors

Pedro Martins
View author publications
You can also search for this author in PubMed Google Scholar
João Costa
View author publications
You can also search for this author in PubMed Google Scholar
José Cecílio
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Furtado
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ICAR-CNR and University of Calabria, Via P. Bucci 41 C, 87036, Rende (CS), Italy
Alfredo Cuzzocrea
Hewlett-Packard Labs, 1501 Page Mill Road, MS 1142, 94304, Palo Alto, CA, USA
Umeshwar Dayal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Martins, P., Costa, J., Cecílio, J., Furtado, P. (2011). VarDB: High-Performance Warehouse Processing with Massive Ordering and Binary Search. In: Cuzzocrea, A., Dayal, U. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2011. Lecture Notes in Computer Science, vol 6862. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23544-3_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-23544-3_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23543-6
Online ISBN: 978-3-642-23544-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics