Synonyms
Definition
Apache Hadoop is an open-source platform for storage and efficient processing of large datasets on a cluster of computers. The framework provides fault tolerance, high availability, and scalability, being able to process petabytes of data. Its principal components are MapReduce and HDFS.
Overview
Introduction
Apache Hadoop is a distributed framework used to tackle Big Data. It is a software platform in a master/worker architecture with three main components: HDFS, YARN, and MapReduce. The HDFS (Hadoop Distributed File System) is an abstraction layer responsible for the storage of data. MapReduce is the data processing framework designed specifically to scale and run distributed. YARN (Yet Another Resource Negotiator) is a management platform responsible for handling resources in the cluster. Hadoop’s open-source software was written in Java and distributed under Apache license 2.0.
The Hadoop framework can be...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cutting D (2016) https://www.youtube.com/watch?v=Phjif53vAhM. Accessed 20 Oct 2017
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113. https://doi.org/10.1145/1327452.1327492
Ghemawat S, Gobioff H, Leung S-T (2003) The Google file system. SIGOPS Oper Syst Rev 37(5):29–43. https://doi.org/10.1145/1165389.945450
Harris D (2013) https://gigaom.com/2013/03/04/the-history-of-hadoop-from-4-nodes-to-the-future-of-data/. Accessed 20 Oct 2017
Rajaraman A, Ullman JD (2011) Mining of massive datasets. Cambridge University Press, New York
White T (2015) Hadoop: the definitive guide, 4th edn. O’Reilly Media, Hadoop
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2019 Springer International Publishing AG, part of Springer Nature
About this entry
Cite this entry
de Souza Granha, R.G.D. (2019). Hadoop. In: Sakr, S., Zomaya, A.Y. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-77525-8_36
Download citation
DOI: https://doi.org/10.1007/978-3-319-77525-8_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77524-1
Online ISBN: 978-3-319-77525-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering