Skip to main content

To Optimize Graph Based Power Iteration for Big Data Based on MapReduce Paradigm

  • Conference paper
  • First Online:
Mining Intelligence and Knowledge Exploration (MIKE 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9468))

  • 1772 Accesses

Abstract

The next big thing in the IT world is Big Data. The values generated from storing and processing of Big Data cannot be analyzed using traditional computing techniques. The main aim of this paper is to design a scalable machine learning algorithm to scaleup and speedup clustering algorithm without losing its accuracy. Clustering using power iteration is fast and scalable. However, it requires matrix computation which makes the algorithm infeasible for Big Data. Moreover, power method converges slowly based on eigen vector. Hence, in this paper an investigation is done on convergence factor by applying a modified constraint that minimizes the computational cost by making the algorithm converge quickly. MapReduce parallel environment for Big Data is verified for the proposed algorithm using different sizes of datasets with different nodes in the cluster selecting speedup, scalability, and efficiency as the indicators. The performance of the proposed algorithm has been shown with respect to the execution time and the number of nodes. The results show that the proposed method is feasible and valid. It improves the overall performance and efficiency of the algorithm that can meet the needs of large scale processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Fashanu, A., Ale, F., Agboola, O.A., Ibidaapo Obe, O.: Performance analysis of parallel computing algorithm developed for space weather simulation. Int. J. Advancements Res. Technol. 1(7), 2278–7763 (2012)

    Google Scholar 

  2. Shirkhorshidi, A.S., Aghabozorgi, S., Wah, T.Y., Herawan, T.: Big data clustering: a review. In: Murgante, B., Misra, S., Rocha, A.M.A.C., Torre, C., Rocha, J.G., Falcão, M.I., Taniar, D., Apduhan, B.O., Gervasi, O. (eds.) ICCSA 2014, Part V. LNCS, vol. 8583, pp. 707–720. Springer, Heidelberg (2014)

    Google Scholar 

  3. Azmoodeh, A., Hashemi, S.: To boost graph clustering based on power iteration by removing outliers. In: Herawan, T., Deris, M.M., Abawajy, J. (eds.) Proceedings of the First International Conference on Advanced Data and Information Engineering. LNEE, vol. 285, pp. 249–258. Springer, Heidelberg (2013)

    Google Scholar 

  4. Elsayed, A., Ismail, O., EiSharkawi, M.E.: MapReduce: state-of-the-art and research directions. Int. J. Comput. Electr. Eng. 6(1) (2014). doi:10.7763/IJCEE.2014.v6.789

  5. Buzbee, B.L.: The efficiency of parallel processing. Frontiers of Supercomputing, Los Alamos Siencee Fall 7 (1983)

    Google Scholar 

  6. Fowlkes, C., Belongie, S., Chung, F., Malik, J.: Spectral grouping using the Nystrom method. IEEE Trans. Pattern Anal. Mach. Intell. 26, 214–225 (2004)

    Article  Google Scholar 

  7. Heller, E.J., Kaplan, L., Pollaman, F.: Inflamatory dynamics for matrix Eigen value problems. PNAS, 105(22), 7631–7635 (2008). doi:10.1073/pnas.0801047105

  8. Xue, F.: Numerical solution of eigenvalue problems with spectral transformations. Doctor of Philosophy (2009)

    Google Scholar 

  9. Alecu, F.: Performance analysis of parallel algorithms. J. Appl. Quant. Methods 2(1), 129–134 (2007)

    Google Scholar 

  10. Lin, F., Cohen, W.W.: Power iteration clustering. In: International Conference on Machine Learning, Haifa, Israel (2010)

    Google Scholar 

  11. Fahad, A., Alshatri, N., Tari, Z., Zomaya, A., Foufou, S., Bouras, A.: A survey of clustering algorithms for big data taxonomy and empirical analysis. IEEE Trans. Emerg. Top. Comput. (2014). doi:10.1109/TETC20142330519

  12. Ninama, H.: Distributed data mining using message passing interface. Rev. Res. 2(9) (2013). ISSN 2249-894X

    Google Scholar 

  13. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. ACM Commun. 51(1), 107–113 (2008)

    Article  Google Scholar 

  14. Lambers, J.: The eigenvalue problem: power iterations. In: MAT 610 Summer Session 2009–10

    Google Scholar 

  15. Yang, J., Li, X.: MapReduce based method for big data semantic clustering. In: Proceedings of the 2013 IEEE International Conference on Systems, Man, and Cybernetics (SMC 2013) (2013). ISBN 978-1-4799-0652/13. doi:10.1109/SMC.2013.480

  16. Kamalraj, N., Malathi, A.: Hadoop operations management for big data clusters in telecommunication industry. Int. J. Comput. Appl. (0975-8887) 105(12), 40–44 (2014)

    Google Scholar 

  17. Shim, K.: MapReduce algorithms for big data analysis. In: Proceedings of the VLDB Endowment, VLDB Endowment 21508097/12/08, vol. 5, no. 12. (2012)

    Google Scholar 

  18. Lancos, C.: An iteration method for the solution of Eigen value problem of linear differential and integral operators. J. Res. Nat. Bur. Stand. 48, 255 (1959)

    Google Scholar 

  19. Steinbach, M., Ertöz, L., Kumar, V.: The challenges of clustering high dimensional data. In: Wille, L.T. (ed.) New Directions in Statistical Physics, Book Part IV, pp. 273–309. Springer, Heidelberg (2004). doi:10.1007/978-3-662-08968-2_16

    Chapter  Google Scholar 

  20. Panju, M.: Iterative methods for computing eigenvalues and eigenvectors. The Waterloo Mathematics Review. University of Waterloo (2011). http://mathreview.waterlo.ca

  21. Numerical methods, chapter 10.3 power method for approximating eigenvalues. www.cengage.com/resource_uploads/downloads/0618783768_138794.pdf

  22. Gobil, P., Garg, D., Panchal, B.: A performance analysis of MapReduce applications on big data in cloud based Hadoop. In: ICICES2014, Chennai. IEEE (2014). ISSN 978-1-4799-3834-6/14

    Google Scholar 

  23. Rong, Z., Xia, D., Hang, Z.: Complex statistical analysis of big data: implementation and application of apriori and FP-growth algorithm based on MapReduce. In: 2013 IEEE 4th International Conference on Software Engineering and Service Science (ICSESS). IEEE (2013). ISSN 978-1-4673-5000-6/13. doi:10.1109/ICSESS.2013.6615467

  24. Chen, W.Y., Song, Y., Bai, H., Lin, C., Chang, E.Y.: Parallel spectral clustering in distributed systems. IEEE Trans. Pattern Anal. Mach. Intell. 33(3), 568–586 (2011)

    Article  Google Scholar 

  25. Yana, W., et al.: p-PIC: parallel power iteration clustering for big data. J. Parallel Distrib. Algorithm 73(3), 352–359 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dhanapal Jayalatchumy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Jayalatchumy, D., Thambidurai, P. (2015). To Optimize Graph Based Power Iteration for Big Data Based on MapReduce Paradigm. In: Prasath, R., Vuppala, A., Kathirvalavakumar, T. (eds) Mining Intelligence and Knowledge Exploration. MIKE 2015. Lecture Notes in Computer Science(), vol 9468. Springer, Cham. https://doi.org/10.1007/978-3-319-26832-3_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26832-3_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26831-6

  • Online ISBN: 978-3-319-26832-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics