Abstract
Data plays a pivotal role in business growth. In fact, data is considered to be an asset to organizations. This is more evident in the enterprises where the data is preserved and mined for discovering knowledge. The data with exponential growth and characterized by volume, velocity, and variety is termed as big data. Mining such voluminous data can give comprehensive business intelligence for making strategic decisions. The emergence of cloud computing technology, parallel processing power of servers, and the distributed programming frameworks like Hadoop with new programming paradigm “MapReduce” pave way for mining massive-scale data. Data mining domain is rich in algorithms that are used to mine data for discovering trends. The era of big data has arrived and mining such data is beyond the capability of conventional data mining techniques. The unprecedented exponential growth of data needs a platform for effective data analysis in real time with fast response. In this paper, we present an overview of big data, mechanisms or algorithms and environment or tools needed to execute them. The rationale behind this paper is that big data mining is the need of the hour in all sectors like finance, biology, healthcare, banking, insurance, and environmental research to name few. Review of various aspects of big data mining can help readers to gain know-how in the context of globalization, business collaborations where mining cross-organization data is essential. This paper also throws light into the relationship among big data, cloud computing technology, Hadoop, and Big data storage systems. In future, we intend to propose and implement algorithms for big data mining.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Katal, A., Wazid, M., Goudar, R.H.: Big Data: Issues, Challenges, Tools and Good Practices, pp. 213–313. IEEE, Piscataway (2013)
Bughin, J., Chui, M., Manyika, J.: Clouds, Big Data, and SmartAssets: Ten Tech-Enabled Business Trends to Watch. McKinSey Quarterly (2010)
Kaisler, S.: Big Data: Issues and Challenges Moving Forward, pp. 12–17. IEEE, Piscataway (2013)
Philip Chen, C.L., Zhang, C.-Y.: Data-Intensive Applications, Challenges, Techniques and Technologies: A Survey on Big Data. Elsevier. pp. 32–44 (2014)
IBM What Is Big Data: Bring Big Data to the Enterprise, http://www-01.ibm.com/software/data/bigdata/, IBM (2012)
Jacobs, A.: The Pathologies of Big Data. Comm. ACM 52(8): 36–44 (2009).
Madden, S.: From Databases to Big Data, pp. 32–44. IEEE, Piscataway (2013)
Zheng, Z., Zhu, J., Lyu, M.R.: Service-Generated Big Data and Big Data-as-a-Service: An Overview, pp. 12–17. IEEE, Piscataway (2013)
Begoli, E., Horey, J.: Design Principles for Effective Knowledge Discovery from Big Data, pp. 12–17. IEEE, Piscataway (2012)
Kopanas, I., Avouris, N., Daskalaki, S.: The role of domain knowledge in a large scale data mining project. In: Vlahavas, I.P., Spyropoulos, C.D. (eds.) Proceedings of the Second Hellenic Conference AI: Methods and Applications of Artificial Intelligence, pp. 288–299 (2002)
Luo, D., Ding, C., Huang, H.: Parallelization with multiplicative algorithms for big data mining. In: Proceedings of the IEEE 12th Int’l Conference Data Mining, pp. 489–498 (2012)
Shafer, J., Agrawal, R., Mehta, M.: SPRINT: a scalable parallel classifier for data mining. In: Proceedings of the 22nd VLDB Conference (1996)
Rajaraman, A., Ullman, J.: Mining of Massive Data Sets. Cambridge University Press, Cambridge (2011)
Lorch, J., Parno, B., Mickens, J., Raykova, M., Schiffman, J.: Shoroud: ensuring private access to large-scale data in the data center. In: Proceedings of the 11th USENIX Conference File and Storage Technologies (FAST ’13) (2013)
Schadt, E.: The Changing Privacy Landscape in the Era of Big Data. Mol. Syst. 8, article 612 (2012)
Machanavajjhala, A., Reiter, J.P.: Big privacy: protecting confidentiality in big data. ACM Crossroads 19(1), 20–23 (2012)
Hashem, I.A.T., Yaqoob, I., Anuar, N.B., Mokhtar, S., Gani, A., Khan, S.U.: The Rise of “Big Data” on Cloud Computing: Review and Open Research Issues, vol. 47, no. 1, pp. 98–115. Elsevier, Amsterdam (2015)
Papadimitriou, S., Sun, J.: Disco: distributed co-clustering with map-reduce: a case study towards petabyte-scale end-to-end mining. In: Proceedings of the IEEE Eighth Int’l Conference Data Mining (ICDM ’08), pp. 512–521 (2008)
Ranger, C., Raghuraman, R., Penmetsa, A., Bradski, G., Kozyrakis, C.: Evaluating MapReduce for multi-core and multiprocessor systems. In: Proceedings of the IEEE 13th Int’l Symposium High Performance Computer Architecture (HPCA ’07), pp. 13–24 (2007)
Wegener, D., Mock, M., Adranale, D., Wrobel, S.: Toolkit-based high-performance data mining of large data on MapReduce clusters. In: Proceedings of the Int’l Conference Data Mining Workshops (ICDMW ’09), pp. 296–301 (2009)
Zhu, X., Zhang, P., Lin, X., Shi, Y.: Active learning from stream data using optimal weight classifier ensemble. IEEE Trans. Syst. Man Cybern. Part B 40(6), 1607–1621 (2010)
Bu, Y., Howe, B., Balazinska, M., Ernst, M.D.: HaLoop: Efficient Iterative Data Processing on Large Clusters, pp. 1–12. IEEE, Piscataway (2010)
Rao, S., Ramakrishnan, R., Silberstein, A.: Sailfish: A Framework For Large Scale Data Processing, pp. 1–14. Microsoft, USA (2012)
Eui-Hong (Sam) Han, George Karypis, Member, IEEE, and Vipin Kumar, Fellow, IEEE: Scalable Parallel Data Mining for Association Rules, vol. 12, no. 3, pp. 25–34. IEEE, Piscataway (2000)
Azzini, A., Ceravolo, P.: Consistent Process Mining Over Big Data Triple Stores, pp. 25–34. IEEE, Piscataway (2013)
Hoi, S.C.H., Wang, J., Zhao, P., Jin, R.: Online Feature Selection For Mining Big Data, pp. 12–17. ACM, New York (2012)
Rakthanmano, T.: Addressing Big Data Time Series: Mining Trillions of Time Series Subsequences Under Dynamic Time Warping, pp. 23–33. ACM, New York (2013)
Laptev, N., Zeng, K., Zaniolo, C.: Very Fast Estimation for Result and Accuracy of Big Data Analytics: The EARL System, pp. 23–33. Springer, Heidelberg (2013)
Zhang, Y.: A Fast Online Learning Algorithm for Distributed Mining of BigData, pp. 213–313. IEEE, Piscataway (2012)
Yadav, C., Wang, S., Kumar, M.: Algorithm and approaches to handle data—a survey. International Journal of Computer Science and Network 2(3), 12–17 (2013)
Kang, U., Faloutsos, C.: Big graph mining: algorithms and discoveries. IEEE 14(2), 25–34 (1998)
Wu, X., Zhu, X., Wu, G.-Q.: Data mining with big data. IEEE 26(1), 97–107 (2014)
Chang, E.Y., Bai, H., Zhu, K.: Parallel algorithms for mining large-scale rich-media data. In: Proceedings of the 17th ACM Int’l Conference Multimedia (MM ’09,) pp. 917–918 (2009)
Wu, X., Zhang, S.: Synthesizing high-frequency rules from different data sources. IEEE Trans. Knowl. Data Eng. 15(2), 353–367 (2003)
Chen, Y.-C., Peng, W.-C., Lee, S.-Y.: Efficient algorithms for influence maximization in social networks. Knowl. Inf. Syst. 33(3), 577–601 (2012)
Zhao, J., Wu, J., Feng, X., Xiong, H., Xu, K.: Information propagation in online social networks: a tie-strength perspective. Knowl. Inf. Syst. 32(3), 589–608 (2012)
Chu, C.T., Kim, S.K., Lin, Y.A., Yu, Y., Bradski, G.R., Ng, A.Y., Olukotun, K.: Map-reduce for machine learning on multicore. In: Proceedings of the 20th Annual Conference Neural Information Processing Systems (NIPS ’06), pp. 281–288 (2006)
Howe, D., et al.: Big data: the future of biocuration. Nature 455, 47–50 (2008)
Huberman, B.: Sociology of science: big data deserve a bigger audience. Nature 482, 308 (2012)
Wu, X., Zhu, X., Wu, G.-Q., Ding, W.: Data Mining with Big Data, pp. 23–33. IEEE, Piscataway (2013)
Mervis, J.: U.S. science policy: agencies rally to tackle big data. Science 336(6077), 22 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Radhika, D., Aruna Kumari, D. (2018). Adding Big Value to Big Businesses: A Present State of the Art of Big Data, Frameworks and Algorithms. In: Saini, A., Nayak, A., Vyas, R. (eds) ICT Based Innovations. Advances in Intelligent Systems and Computing, vol 653. Springer, Singapore. https://doi.org/10.1007/978-981-10-6602-3_17
Download citation
DOI: https://doi.org/10.1007/978-981-10-6602-3_17
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6601-6
Online ISBN: 978-981-10-6602-3
eBook Packages: EngineeringEngineering (R0)