Skip to main content

Data Lake

  • Living reference work entry
  • First Online:
Encyclopedia of Big Data Technologies

Definition

A data lake is a data repository in which datasets from multiple sources are stored in their original structures. It should provide functions to extract data and metadata from heterogeneous sources and to ingest them into a hybrid storage system. In addition, a data lake should offer a data transformation engine, in which datasets can be transformed, cleaned, and integrated with other datasets. Finally, interfaces to explore and to query the data and metadata of a data lake should be also available in a data lake system.

Overview

The term “data lake” (DL) was first mentioned by James Dixon in 2010 in a blog post (https://jamesdixon.wordpress.com/2010/10/14/pentaho-hadoop-and-data-lakes/) where he put data marts on the same level as bottled water, which is cleansed, packaged, and structured for easy consumption. In contrast, a data lake manages the raw data as it is ingested from the data sources.

In the initial article (and in a later, more detailed article (https://jamesdixon.wordpress.com/2014/09/25/data-lakes-revisited/...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christoph Quix .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Quix, C., Hai, R. (2018). Data Lake. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_7-1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63962-8_7-1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63962-8

  • Online ISBN: 978-3-319-63962-8

  • eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics