Skip to main content

On the Effectiveness of Distributed Learning on Different Class-Probability Distributions of Data

  • Conference paper
Advances in Artificial Intelligence (CAEPIA 2011)

Abstract

The unrestrainable growth of data in many domains in which machine learning could be applied has brought a new field called large-scale learning that intends to develop efficient and scalable algorithms with regard to requirements of computation, memory, time and communications. A promising line of research for large-scale learning is distributed learning. It involves learning from data stored at different locations and, eventually, select and combine the “local” classifiers to obtain a unique global answer using one of three main approaches. This paper is concerned with a significant issue that arises when distributed data comes in from several sources, each of which has a different distribution. The class-probability distribution of data (CPDD) is defined and its impact on the performance of the three combination approaches is analyzed. Results show the necessity of taking into account the CPDD, concluding that combining only related knowledge is the most appropriate manner for learning in a distributed manner.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bottou, L., Bousquet, O.: The tradeoffs of large scale learning. In: Advances in Neural Information Processing Systems, vol. 20, pp. 161–168 (2008)

    Google Scholar 

  2. PASCAL Large Scale Learning Challenge (2008), http://largescale.first.fraunhofer.de/ (Online; accessed May 10, 2011)

  3. Catlett, J.: Megainduction: machine learning on very large databases. PhD thesis, School of Computer Science, University of Technology, Sydney, Australia (1991)

    Google Scholar 

  4. Tsoumakas, G.: Distributed Data Mining. In: Database Technologies: Concepts, Methodologies, Tools, and Applications, pp. 157–171 (2009)

    Google Scholar 

  5. Tsoumakas, G., Vlahavas, I.: Effective stacking of distributed classifiers. In: Proc. 15th European Conference on Artificial Intelligence (ECAI 2002), pp. 340–344. Ios Pr. Inc. (2002)

    Google Scholar 

  6. Guijarro-Berdiñas, B., Martínez-Rego, D., Fernández-Lorenzo, S.: Privacy-Preserving Distributed Learning Based on Genetic Algorithms and Artificial Neural Networks. In: Distributed Computing, Artificial Intelligence, Bioinformatics, Soft Computing, and Ambient Assisted Living, pp. 195–202 (2009)

    Google Scholar 

  7. McClean, S., Scotney, B., Greer, K., Páircéir, R.: Conceptual Clustering of Heterogeneous Distributed Databases. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 46–55. Springer, Heidelberg (2001)

    Google Scholar 

  8. Bronshtein, I.N., Semendyayev, K.A., Hirsch, K.A.: Handbook of mathematics. Springer, Berlin (2007)

    MATH  Google Scholar 

  9. Agrawal, R., Srikant, R.: Privacy-preserving data mining. ACM Sigmod Record 29(2), 439–450 (2000)

    Article  Google Scholar 

  10. Dietterich, T.: Ensemble Methods in Machine Learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  11. Lam, L., Suen, C.Y.: A theoretical analysis of the application of majority voting to pattern recognition. In: Proceedings of the 12th ICPR, vol. 2, pp. 418–420. IEEE (1994)

    Google Scholar 

  12. Tsoumakas, G., Angelis, L., Vlahavas, I.: Clustering classifiers for knowledge discovery from physically distributed databases. Data & Knowledge Engineering 49(3), 223–242 (2004)

    Article  Google Scholar 

  13. Yang, W., Huang, S.: Data privacy protection in multi-party clustering. Data & Knowledge Engineering 67(1), 185–199 (2008)

    Article  Google Scholar 

  14. Adhikari, A., Rao, P.R.: Efficient clustering of databases induced by local patterns. Decision Support Systems 44(4), 925–943 (2008)

    Article  Google Scholar 

  15. Frank, A., Asuncion, A.: UCI machine learning repository (2010), http://archive.ics.uci.edu/ml (Online; accessed May 10, 2011)

  16. Quinlan, J.R.: C4. 5: programs for machine learning. Morgan Kaufmann (1993)

    Google Scholar 

  17. Bishop, C.M.: Neural networks for pattern recognition. Oxford University Press, USA (1995)

    MATH  Google Scholar 

  18. Vapnik, V.N.: The nature of statistical learning theory. Springer, Heidelberg (2000)

    Book  MATH  Google Scholar 

  19. Weiss, S.M., Kulikowski, C.A.: Computer systems that learn: classification and prediction methods from statistics, neural nets, machine learning, and expert systems. Morgan Kaufmann, San Francisco (1991)

    Google Scholar 

  20. Hollander, M., Wolfe, D.A.: Nonparametric statistical methods (1999)

    Google Scholar 

  21. Hsu, J.C.: Multiple comparisons: theory and methods. Chapman & Hall/CRC (1996)

    Google Scholar 

  22. Hansen, L.K., Salamon, P.: Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence 12(10), 993–1001 (1990)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Peteiro-Barral, D., Guijarro-Berdiñas, B., Pérez-Sánchez, B. (2011). On the Effectiveness of Distributed Learning on Different Class-Probability Distributions of Data. In: Lozano, J.A., Gámez, J.A., Moreno, J.A. (eds) Advances in Artificial Intelligence. CAEPIA 2011. Lecture Notes in Computer Science(), vol 7023. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25274-7_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-25274-7_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-25273-0

  • Online ISBN: 978-3-642-25274-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics