Skip to main content

Choice of Best Samples for Building Ensembles in Dynamic Environments

  • Conference paper
  • First Online:
Engineering Applications of Neural Networks (EANN 2016)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 629))

Abstract

Machine learning approaches often focus on optimizing the algorithm rather than assuring that the source data is as rich as possible. However, when it is possible to enhance the input examples to construct models, one should consider it thoroughly. In this work, we propose a technique to define the best set of training examples using dynamic ensembles in text classification scenarios. In dynamic environments, where new data is constantly appearing, old data is usually disregarded, but sometimes some of those disregarded examples may carry substantial information. We propose a method that determines the most relevant examples by analysing their behaviour when defining separating planes or thresholds between classes. Those examples, deemed better than others, are kept for a longer time-window than the rest. Results on a Twitter scenario show that keeping those examples enhances the final classification performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Costa, J., Silva, C., Antunes, M., Ribeiro, B.: Concept drift awareness in Twitter streams. In: Proceedings of the 13th International Conference on Machine Learning and Applications, pp. 294–299 (2014)

    Google Scholar 

  2. Mejri, D., Khanchel, R., Limam, M.: An ensemble method for concept drift in nonstationary environment. J. Stat. Comput. Simul. 83(6), 1115–1128 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  3. Ditzler, G., Polikar, R.: Incremental learning of concept drift from streaming imbalanced data. IEEE Trans. Knowl. Data Eng. 25(10), 2283–2301 (2013)

    Article  Google Scholar 

  4. Tsymbal, A.: The problem of concept drift: definitions and related work, Department of Computer Science, Trinity College Dublin. Technical report (2004)

    Google Scholar 

  5. Costa, J., Silva, C., Antunes, M., Ribeiro, B.: DOTS: drift oriented tool system. In: Arik, S., Huang, T., Lai, W.K., Liu, Q. (eds.) ICONIP 2015. LNCS, vol. 9492, pp. 615–623. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

  6. Widmer, G., Kubat, M.: Effective learning in dynamic environments by explicit context tracking. In: Proceedings of European Conference on Machine Learning, pp. 227–243 (1993)

    Google Scholar 

  7. Costa, J., Silva, C., Antunes, M., Ribeiro, B.: Defining semantic meta-hashtags for twitter classification. In: Tomassini, M., Antonioni, A., Daolio, F., Buesser, P. (eds.) ICANNGA 2013. LNCS, vol. 7824, pp. 226–235. Springer, Heidelberg (2013)

    Google Scholar 

  8. Kim, J., Bentley, P., Aickelin, U., Greensmith, J., Tedesco, G., Twycross, J.: Immune system approaches to intrusion detection - a review. Nat. Comput. 6(4), 413–466 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  9. Elwell, R., Polikar, R.: Incremental learning of concept drift in nonstationary environments. IEEE Trans. Netw. 22, 1517–1531 (2011)

    Article  Google Scholar 

  10. Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: a new ensemble method for tracking concept drift. In: Proceedings of the 3rd International Conference on Data Mining, pp. 123–130 (2003)

    Google Scholar 

  11. Huang, J., Thornton, K.M., Efthimiadis, E.N.: Conversational tagging in Twitter. In: Proceedings of the 21st ACM conference on Hypertext and hypermedia, pp. 173–178 (2010)

    Google Scholar 

  12. Merriam-webster’s dictionary, October 2012

    Google Scholar 

  13. Zappavigna, M.: Ambient affiliation: a linguistic perspective on Twitter. New Media Soc. 13(5), 788–806 (2011)

    Article  Google Scholar 

  14. Johnson, S.: How Twitter will change the way we live. Time Mag. 173, 23–32 (2009)

    Google Scholar 

  15. Tsur, O., Rappoport, A.: What’s in a hashtag?: content based prediction of the spread of ideas in microblogging communities. In: Proceedings of the 5th International Conference on Web Search and Data Mining, pp. 643–652 (2012)

    Google Scholar 

  16. Yang, L., Sun, T., Zhang, M., Mei, Q.: We know what @you #tag: does the dual role affect hashtag adoption? In: Proceedings of the 21st International Conference on World Wide Web, pp. 261–270 (2012)

    Google Scholar 

  17. Chang, H.-C.: A new perspective on Twitter hashtag use: diffusion of innovation theory. In: Proceedings of the 73rd Annual Meeting on Navigating Streams in an Information Ecosystem, pp. 85:1–85:4 (2010)

    Google Scholar 

  18. Costa, J., Silva, C., Antunes, M., Ribeiro, B.: The impact of longstanding messages in micro-blogging classification. Int. Joint Conference on Neural Networks (IJCNN) 2015, 1–8 (2015)

    Google Scholar 

  19. Zliobaite, I.: Learning under concept drift: an overview. Vilnius University, Faculty of Mathematics and Informatic, Technical report (2010)

    Google Scholar 

  20. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1999)

    MATH  Google Scholar 

  21. Joachims, T.: Learning Text Classifiers with Support Vector Machines. Kluwer Academic Publishers, Dordrecht (2002)

    Book  Google Scholar 

  22. Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45–66 (2002)

    MATH  Google Scholar 

  23. Costa, J., Silva, C., Antunes, M., Ribeiro, B.: On using crowdsourcing and active learning to improve classification performance. In: Proceeding of the 11th International Conference on Intelligent Systems Design and Applications, pp. 469–474 (2011)

    Google Scholar 

  24. van Rijsbergen, C.: Information Retrieval, 2nd edn. Butterworths, London (1979)

    MATH  Google Scholar 

Download references

Acknowledgments

This work is financed by the ERDF - European Regional Development Fund through the Operational Programme for Competitiveness and Internationalisation - COMPETE 2020 Programme within project “POCI-01-0145-FEDER-006961”, and by National Funds through the FCT - Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) as part of project UID/EEA/50014/2013.

This work was supported by national funds through the Portuguese Foundation for Science and Technology (FCT), and by the European Regional Development Fund (FEDER) through COMPETE 2020 – Operational Program for Competitiveness and Internationalization (POCI).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Catarina Silva .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Costa, J., Silva, C., Antunes, M., Ribeiro, B. (2016). Choice of Best Samples for Building Ensembles in Dynamic Environments. In: Jayne, C., Iliadis, L. (eds) Engineering Applications of Neural Networks. EANN 2016. Communications in Computer and Information Science, vol 629. Springer, Cham. https://doi.org/10.1007/978-3-319-44188-7_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-44188-7_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-44187-0

  • Online ISBN: 978-3-319-44188-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics