Skip to main content

VarDB: High-Performance Warehouse Processing with Massive Ordering and Binary Search

  • Conference paper
Data Warehousing and Knowledge Discovery (DaWaK 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6862))

Included in the following conference series:

  • 1261 Accesses

Abstract

Current data base management systems (DBMS) compete aggressively for performance. In order to accomplish that, they are adopting new storage schemas, developing better compression algorithms, using faster hardware, optimizing parallel and distributed data processing. Current row-wise systems do not exploit massive ordering redundancy, and current column-wise approaches exploit only partially. An important current research issue concerns replacing optimization and processing complexity by less complex but ultra fast solutions. We propose the varDB approach to optimize performance over data warehouses. The solution minimizes complex operators, by applying a simple scheme and organizing all structures and processing to that end: massive ordering with efficient sorting and log2N searching. Considering data warehouses, with periodic loads and frequent analysis operations, such an approach provides very fast query processing. In our work we show how it is possible to use this massive data ordering/sorting in order to optimize queries for high speed, even without the use of data compression (therefore also avoiding compression/decompression overheads). We dedicate our attention to sort columns of data and correlating them with other replicated and unsorted columns. For querying, we focus on binary-search and the use of mainly offsets. Our tests of loading data, sorting vs. creating indexes and executing very selective operations like data filtering and joining show, using a simple disk based prototype, that we are able to obtain much better performance comparing with optimized row-wise engines, and also improvements when comparing with column-wise optimized engines. Comparing to those we were able to attain at least similar performance for many queries and much better performance for queries with complex joins.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Richard Burns, Senior Consultant. Exadata – the Sequel, Exadata V2 is Still Oracle. Teradata Corporation

    Google Scholar 

  2. Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E.J., O’Neil, P.E., Rasin, A., Tran, N., Zdonik, S.B.: C-Store: A column-oriented DBMS. In: VLDB, pp. 553–564 (2005)

    Google Scholar 

  3. Stonebraker, M., Hellerstein, J.: What Goes Around Comes Around. In: Readings in Database Systems, 4th edn., pp. 2–41. The MIT Press, Cambridge (2005)

    Google Scholar 

  4. Halverson, A., Beckmann, J.L., Naughton, J.F., Dewitt, D.J.: A Comparison of C-Store and Row-Store in a Common Framework. Technical Report TR1570. University of Wisconsin-Madison (2006)

    Google Scholar 

  5. Pavlo, A., Rasin, A., Madden, S., Stonebraker, M., DeWitt, D., Paulson, E., Shrinivas, L., Abadi, D.J.: A Comparison of Approaches to Large Scale Data Analysis. In: SIGMOD 2009, June 29-July 2 (2009)

    Google Scholar 

  6. Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., Rasin, A.: HadoopDB: An Architectural Hybrid of Map Reduce and DBMS Technologies for Analytical Workloads. In: VLDB 2009, Lyon, France, August 24-28 (2009)

    Google Scholar 

  7. VoltDB Technical Overview White Paper

    Google Scholar 

  8. Cole, B.:Hybrid embedded database merges on-disk and in-memory data management. Embedded.com (February 2007)

    Google Scholar 

  9. Stonebraker, M., Abadi, D.J., Batkin, A., et al.: C-Store: A Column-oriented DBMS. In: VLDB (2005)

    Google Scholar 

  10. Ramakrisnan, R.: Database Management Systems, 3rd edn. University of Wisconsin Madison, Wsiconsin

    Google Scholar 

  11. Furtado, P.: A Survey of Parallel and Distributed Data Warehouses. International Journal of Data Warehousing & Mining, 57–77 (April-June 2009) ; University de Coimbra

    Google Scholar 

  12. Boncz, P.A., Zukowski, M., Nes, N.: MonetDB/X100: Hyper-Pipelining Query Execution. In: CIDR (2005)

    Google Scholar 

  13. Olofson, C.: Worldwide RDBMS 2005 vendor shares. Technical Report 201692, IDC (May 2006)

    Google Scholar 

  14. Vesset, D.: Worldwide data warehousing tools 2005 vendor shares. Technical Report 203229, IDC (August 2006)

    Google Scholar 

  15. Boncz, P.A., Manegold, S., Kersten, M.L.: Database Architecture Optimized for the New Bottleneck: Memory Access. In: VLDB (1999)

    Google Scholar 

  16. Copeland, G.P., Khoshafian, S.: A Decomposition Storage Model. In: SIGMOD (1985)

    Google Scholar 

  17. Grund, M., Krueger, J., Plattner, H.: HYRISE—A Main Memory Hybrid Storage Engine. In: VLDB 2010, Singapore, September 13-17 (2010)

    Google Scholar 

  18. Titman, P.J.: An Experimental DataBase System Using Binary: Relations. In: IFIP Working Conference Data Base Management (1974)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Martins, P., Costa, J., Cecílio, J., Furtado, P. (2011). VarDB: High-Performance Warehouse Processing with Massive Ordering and Binary Search. In: Cuzzocrea, A., Dayal, U. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2011. Lecture Notes in Computer Science, vol 6862. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23544-3_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23544-3_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23543-6

  • Online ISBN: 978-3-642-23544-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics