To Optimize Graph Based Power Iteration for Big Data Based on MapReduce Paradigm

Jayalatchumy, Dhanapal; Thambidurai, Perumal

doi:10.1007/978-3-319-26832-3_35

Dhanapal Jayalatchumy¹⁶ &
Perumal Thambidurai¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9468))

Included in the following conference series:

International Conference on Mining Intelligence and Knowledge Exploration

1772 Accesses

Abstract

The next big thing in the IT world is Big Data. The values generated from storing and processing of Big Data cannot be analyzed using traditional computing techniques. The main aim of this paper is to design a scalable machine learning algorithm to scaleup and speedup clustering algorithm without losing its accuracy. Clustering using power iteration is fast and scalable. However, it requires matrix computation which makes the algorithm infeasible for Big Data. Moreover, power method converges slowly based on eigen vector. Hence, in this paper an investigation is done on convergence factor by applying a modified constraint that minimizes the computational cost by making the algorithm converge quickly. MapReduce parallel environment for Big Data is verified for the proposed algorithm using different sizes of datasets with different nodes in the cluster selecting speedup, scalability, and efficiency as the indicators. The performance of the proposed algorithm has been shown with respect to the execution time and the number of nodes. The results show that the proposed method is feasible and valid. It improves the overall performance and efficiency of the algorithm that can meet the needs of large scale processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Fashanu, A., Ale, F., Agboola, O.A., Ibidaapo Obe, O.: Performance analysis of parallel computing algorithm developed for space weather simulation. Int. J. Advancements Res. Technol. 1(7), 2278–7763 (2012)
Google Scholar
Shirkhorshidi, A.S., Aghabozorgi, S., Wah, T.Y., Herawan, T.: Big data clustering: a review. In: Murgante, B., Misra, S., Rocha, A.M.A.C., Torre, C., Rocha, J.G., Falcão, M.I., Taniar, D., Apduhan, B.O., Gervasi, O. (eds.) ICCSA 2014, Part V. LNCS, vol. 8583, pp. 707–720. Springer, Heidelberg (2014)
Google Scholar
Azmoodeh, A., Hashemi, S.: To boost graph clustering based on power iteration by removing outliers. In: Herawan, T., Deris, M.M., Abawajy, J. (eds.) Proceedings of the First International Conference on Advanced Data and Information Engineering. LNEE, vol. 285, pp. 249–258. Springer, Heidelberg (2013)
Google Scholar
Elsayed, A., Ismail, O., EiSharkawi, M.E.: MapReduce: state-of-the-art and research directions. Int. J. Comput. Electr. Eng. 6(1) (2014). doi:10.7763/IJCEE.2014.v6.789
Buzbee, B.L.: The efficiency of parallel processing. Frontiers of Supercomputing, Los Alamos Siencee Fall 7 (1983)
Google Scholar
Fowlkes, C., Belongie, S., Chung, F., Malik, J.: Spectral grouping using the Nystrom method. IEEE Trans. Pattern Anal. Mach. Intell. 26, 214–225 (2004)
Article Google Scholar
Heller, E.J., Kaplan, L., Pollaman, F.: Inflamatory dynamics for matrix Eigen value problems. PNAS, 105(22), 7631–7635 (2008). doi:10.1073/pnas.0801047105
Xue, F.: Numerical solution of eigenvalue problems with spectral transformations. Doctor of Philosophy (2009)
Google Scholar
Alecu, F.: Performance analysis of parallel algorithms. J. Appl. Quant. Methods 2(1), 129–134 (2007)
Google Scholar
Lin, F., Cohen, W.W.: Power iteration clustering. In: International Conference on Machine Learning, Haifa, Israel (2010)
Google Scholar
Fahad, A., Alshatri, N., Tari, Z., Zomaya, A., Foufou, S., Bouras, A.: A survey of clustering algorithms for big data taxonomy and empirical analysis. IEEE Trans. Emerg. Top. Comput. (2014). doi:10.1109/TETC20142330519
Ninama, H.: Distributed data mining using message passing interface. Rev. Res. 2(9) (2013). ISSN 2249-894X
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. ACM Commun. 51(1), 107–113 (2008)
Article Google Scholar
Lambers, J.: The eigenvalue problem: power iterations. In: MAT 610 Summer Session 2009–10
Google Scholar
Yang, J., Li, X.: MapReduce based method for big data semantic clustering. In: Proceedings of the 2013 IEEE International Conference on Systems, Man, and Cybernetics (SMC 2013) (2013). ISBN 978-1-4799-0652/13. doi:10.1109/SMC.2013.480
Kamalraj, N., Malathi, A.: Hadoop operations management for big data clusters in telecommunication industry. Int. J. Comput. Appl. (0975-8887) 105(12), 40–44 (2014)
Google Scholar
Shim, K.: MapReduce algorithms for big data analysis. In: Proceedings of the VLDB Endowment, VLDB Endowment 21508097/12/08, vol. 5, no. 12. (2012)
Google Scholar
Lancos, C.: An iteration method for the solution of Eigen value problem of linear differential and integral operators. J. Res. Nat. Bur. Stand. 48, 255 (1959)
Google Scholar
Steinbach, M., Ertöz, L., Kumar, V.: The challenges of clustering high dimensional data. In: Wille, L.T. (ed.) New Directions in Statistical Physics, Book Part IV, pp. 273–309. Springer, Heidelberg (2004). doi:10.1007/978-3-662-08968-2_16
Chapter Google Scholar
Panju, M.: Iterative methods for computing eigenvalues and eigenvectors. The Waterloo Mathematics Review. University of Waterloo (2011). http://mathreview.waterlo.ca
Numerical methods, chapter 10.3 power method for approximating eigenvalues. www.cengage.com/resource_uploads/downloads/0618783768_138794.pdf
Gobil, P., Garg, D., Panchal, B.: A performance analysis of MapReduce applications on big data in cloud based Hadoop. In: ICICES2014, Chennai. IEEE (2014). ISSN 978-1-4799-3834-6/14
Google Scholar
Rong, Z., Xia, D., Hang, Z.: Complex statistical analysis of big data: implementation and application of apriori and FP-growth algorithm based on MapReduce. In: 2013 IEEE 4th International Conference on Software Engineering and Service Science (ICSESS). IEEE (2013). ISSN 978-1-4673-5000-6/13. doi:10.1109/ICSESS.2013.6615467
Chen, W.Y., Song, Y., Bai, H., Lin, C., Chang, E.Y.: Parallel spectral clustering in distributed systems. IEEE Trans. Pattern Anal. Mach. Intell. 33(3), 568–586 (2011)
Article Google Scholar
Yana, W., et al.: p-PIC: parallel power iteration clustering for big data. J. Parallel Distrib. Algorithm 73(3), 352–359 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of CSE, PKIET, Karaikal, India
Dhanapal Jayalatchumy
PKIET, Karaikal, India
Perumal Thambidurai

Authors

Dhanapal Jayalatchumy
View author publications
You can also search for this author in PubMed Google Scholar
Perumal Thambidurai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dhanapal Jayalatchumy .

Editor information

Editors and Affiliations

Norwegian Univ. of Science & Technology, Trondheim, Norway
Rajendra Prasath
Intl Inst of Info Tech Hyderabad, Hyderabad, India
Anil Kumar Vuppala
V.H.N.S.N.College (Autonomous), Virudhunagar, Tamil Nadu, India
T. Kathirvalavakumar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jayalatchumy, D., Thambidurai, P. (2015). To Optimize Graph Based Power Iteration for Big Data Based on MapReduce Paradigm. In: Prasath, R., Vuppala, A., Kathirvalavakumar, T. (eds) Mining Intelligence and Knowledge Exploration. MIKE 2015. Lecture Notes in Computer Science(), vol 9468. Springer, Cham. https://doi.org/10.1007/978-3-319-26832-3_35

Download citation

DOI: https://doi.org/10.1007/978-3-319-26832-3_35
Published: 03 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26831-6
Online ISBN: 978-3-319-26832-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics