Running Data Mining Applications on the Grid: A Bag-of-Tasks Approach

da Silva, Fabrício A. B.; Carvalho, Sílvia; Senger, Hermes; Hruschka, Eduardo R.; de Farias, Cléver R. G.

doi:10.1007/978-3-540-24709-8_18

Fabrício A. B. da Silva²⁰,
Sílvia Carvalho²⁰,
Hermes Senger²⁰,
Eduardo R. Hruschka²⁰ &
…
Cléver R. G. de Farias²⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3044))

Included in the following conference series:

International Conference on Computational Science and Its Applications

880 Accesses
4 Citations

Abstract

Data mining (DM) applications are composed of computing-intensive processing tasks working on huge datasets. Due to its computing-intensive nature, these applications are natural candidates for execution on high performance, high throughput platforms such as PC clusters and computational grids. Many data mining algorithms can be implemented as bag-of-tasks (BoT) applications, i.e., parallel applications composed of independent tasks. This paper discusses the use of computing grids for the execution of DM algorithms as BoT applications, investigates the scalability of the execution of an application and proposes an approach to improve its scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Fayyad, U.M., Shapiro, G.P., Smyth, P.: From Data Mining to Knowledge Discovery: An Overview. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 1–37. MIT Press, Cambridge (1996)
Google Scholar
Freitas, A.A., Lavington, S.H.: Mining Very Large Databases with Parallel Processing. Kluwer Academic Publishers, Dordrecht (1998)
MATH Google Scholar
Baraglia, R., et al.: Implementation Issues in the Design of I/O Intensive Data Mining Applications on Clusters of Workstations. In: Proc. of the 3rd Workshop on High Performance Data Mining, International Parallel and Distributed Processing Symposium, Cancun, Mexico (2000)
Google Scholar
Baker, M., Buyya, R., Laforenza, D.: Grids and Grid Technologies for Wide-area Distributed Computing. Software, Pratice and Experience 32, 1437–1466 (2002)
Article MATH Google Scholar
Cirne, W., et al.: Running Bag-of_Tasks Applications on Ccmputational Grids: The My-Grid Approach. In: Proc. of the 2003 International Conference on Parallel Processing (October 2003)
Google Scholar
Hruschka, E.R., Ebecken, N.F.F.: A genetic algorithm for cluster analysis. Intelligent Data Analysis (IDA) 7, 15–25 (2003)
Google Scholar
Canataro, M., Talia, D.: The Knowledge Grid. Communications of the ACM 46(1) (2003)
Google Scholar
Orlando, S., Palmerini, P., Perego, R., Silvestri, F.: Scheduling High Performance Data Mining Tasks on a Data Grid Environment. In: Monien, B., Feldmann, R.L. (eds.) Euro-Par 2002. LNCS, vol. 2400, pp. 375–384. Springer, Heidelberg (2002)
Chapter Google Scholar
Hinke, H., Novotny, J.: Data Mining on NASAś Information Power Grid. In: HPDC 2000, Pittsburgh, Pennsylvania, USA, pp. 292–293. IEEE Computer Society, Los Alamitos (2000)
Google Scholar
Agrawal, R., et al.: Fast Discovery of Association Rules. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 307–328. MIT Press, Cambridge (1996)
Google Scholar
Goldberg, D.E.: Genetic Algorithms in Search. In: Optimization and Machine Learning, USA, Addison Wesley Longman Inc., Amsterdam (1989)
Google Scholar
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data, An Introduction to Cluster Analysis. Wiley Series in Probability and Mathematical Statistics (1990)
Google Scholar
Merz, C.J., Murphy, P.M.: UCI Repository of Machine Learning Databases Irvine, CA, University of California, http://www.ics.uci.edu
Litzkow, M., Livny, M., Mutka, M.: Condor – A Hunter of Idle Workstations. In: Proc. of the 8th International Conference of Distributed Computing Systems, June 1988, pp. 104–111 (1988)
Google Scholar
Grimshaw, A., Wulf, W.: Legion: The next logical step toward the world-wide virtual computer. Communications of the ACM 40(1), 39–45 (1997)
Article Google Scholar
BOINC. Project homepage, available at http://boinc.berkeley.edu
Foster, I., Kesselman, C.: Globus: A Metacomputing Infrastructure Toolkit. Intl J. Supercomputer Applications 11(2), 115–128 (1997)
Article Google Scholar
Falkenauer, E.: Genetic Algorithms and Grouping Problems. John Wiley & Sons, Chichester (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Universidade Católica de Santos (UniSantos), R. Dr. Carvalho de Mendonça, 144, CEP 11030-906, Santos, SP, Brazil
Fabrício A. B. da Silva, Sílvia Carvalho, Hermes Senger, Eduardo R. Hruschka & Cléver R. G. de Farias

Authors

Fabrício A. B. da Silva
View author publications
You can also search for this author in PubMed Google Scholar
Sílvia Carvalho
View author publications
You can also search for this author in PubMed Google Scholar
Hermes Senger
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo R. Hruschka
View author publications
You can also search for this author in PubMed Google Scholar
Cléver R. G. de Farias
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Chemistry, University of Perugia, Via Elce di Sotto, 8, I-06123, Perugia, Italy
Antonio Laganá
Department of Computer Science, University of Calgary, 2500 University Drive N.W., T2N 1N4, Calgary AB, Canada
Marina L. Gavrilova
William Norris Professor, Head of the Computer Science and Engineering Department, University of Minnesota, USA
Vipin Kumar
School of Computing, Soongsil University, Seoul, Korea
Youngsong Mun
OptimaNumerics Ltd., Cathedral House, 23-31 Waring Street, BT1 2DX, Belfast, UK
C. J. Kenneth Tan
Department of Mathematics and Computer Science, University of Perugia, via Vanvitelli, 1, I-06123, Perugia, Italy
Osvaldo Gervasi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

da Silva, F.A.B., Carvalho, S., Senger, H., Hruschka, E.R., de Farias, C.R.G. (2004). Running Data Mining Applications on the Grid: A Bag-of-Tasks Approach. In: Laganá, A., Gavrilova, M.L., Kumar, V., Mun, Y., Tan, C.J.K., Gervasi, O. (eds) Computational Science and Its Applications – ICCSA 2004. ICCSA 2004. Lecture Notes in Computer Science, vol 3044. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24709-8_18

Download citation

DOI: https://doi.org/10.1007/978-3-540-24709-8_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22056-5
Online ISBN: 978-3-540-24709-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics