Abstract
We present an online ensemble approach, diversified dynamic weighted majority (DDWM) to classify new data instances which have varying conceptual distributions. Our approach maintains two sets of weighted ensembles that differentiate in their level of diversity. An expert in either of the ensembles is updated or removed as per its classification accuracy and a new expert is added based on the final global prediction of the algorithm and the global prediction of the ensemble for any data instance. Experimental evaluation using various artificial and real-world datasets proves that DDWM provides very high accuracy in classifying new data instances, irrespective of size of dataset, type of drift or presence of noise. We compare DDWM with the other learners in terms of new performance metrics such as kappa statistic, model cost, and the evaluation time and memory requirements. Our approach proved to be highly resource effective achieving very high accuracies even in a resource constrained environment.
Similar content being viewed by others
References
Baena-Garcı´a M, Del Campo-Avila J, Fidalgo R, Bifet A (2006) Early drift detection method. In: Proceedings Fourth ECML PKDD Int’l Workshop Knowledge Discovery from Data Streams (IWKDDS’06), pp 77–86
Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis, a framework for stream classification and clustering. In: workshop on applications of pattern analysis, JMLR: Workshop and Conference Proceedings, vol 11. p 44
Blum A (1997) Empirical support for winnow and weighted majority algorithms: results on a calendar scheduling domain, machine learning. Kluwer Academic Publisher, Boston
Dawid A, Vovk V (1999) Prequential probability : principles and proper ties. Bernoulli 5(1):125–162
Dietterich TG (1997) Machine learning research: four current directions. Artif Intell 18(4):97–136
Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection, In: Proceeding Seventh Brazilian Symp. Artificial Intelligence (SBIA’04), pp. 286–295
Gao J, Fan W, Han J (2007) On appropriate assumptions to mine data streams: analysis and practice. In: Proceedings IEEE Int’l Conf. Data Mining (ICDM,’07), pp 143–152
Harries M (1999) Splice-2 comparative evaluation: electricity pricing, Technical report. University of New South Wales, Australia
Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams, In: Proceedings KDD’01, ACM Press. San Francisco, 2001, pp 97–106
Kolter JZ, Maloof MA (2003) Dynamic weighted majority: a new ensemble method for tracking concept drift. In: Proceedings of the 3rd ICDM, USA, pp 123–130
Kolter JZ, Maloof MA (2005) Using additive expert ensembles to cope with concept drift. In: Proceedings Int’l Conf. Machine Learning (ICML’05), pp 449–456
Kolter JZ, Maloof MA (2007) Dynamic weighted majority: an ensemble method for drifting concepts. J Mach Learn Res 8:2755–2790
Littlestone N, Warmuth M (1994) The weighted majority algorithm. Inf Comput 108:212–261
Mansoori M, Zakaria O, Gani A (2012) Improving exposure of intrusion deception system through implementation of hybrid honeypot. IAJIT 9 (5): 436–444
Minku FL, White A, Yao X (2010) The Impact of Diversity on On-Line Ensemble Learning in the Presence of Concept Drift. IEEE Trans Knowl Data Eng 22(5):730–742
Minku LL, Yao X (2012) DDD: a new ensemble approach for dealing with concept drift. IEEE Trans Knowl Data Eng 24(4):619
Nishida K (2008) Learning and Detecting Concept Drift, PhD dissertation, Hokkaido Univ. [Online]. http://lis2.huie.hokudai.ac.jp/%20%20knishida/paper/nishida2008-dissertation%20.pdf
Nishida K, Yamauchi K (2007) Adaptive classifiers-ensemble system for tracking concept drift. In: Proceedings Sixth Int’l Conf. Machine Learning and Cybernetics (ICMLC’07), pp 3607–3612
Nishida K, Yamauchi K (2007) Detecting concept drift using statistical testing. In: Proceedings 10th Int’l Conf. Discovery Science (DS’07), pp 264–269
Nishida K, Yamauchi K, Omori T (2005) ACE: adaptive classifiers-ensemble system for concept-drifting environments. In: Proceedings of the 6th International Workshop on Multiple Classifier Systems, ser. Lect Notes Comput Sci 3541:176–185
Oza NC, Russell S (2001) Experimental comparisons of online and batch versions of bagging and boosting. In: Proceedings of the Seventh ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD’01), ACM Press, New York, pp 359–364
Scholz M, Klinkenberg R (2005) An ensemble classifier for drifting concepts. In: Proceedings of the Second International Workshop on Knowledge Discovery from Data Streams (IWKDDS’05), Porto, pp 53–64
Sidhu P, Bhatia MPS (2014) Extended dynamic weighted majority using diversity to handle drifts. New Trends Databases Inf Sys Adv Intell Sys Comput 241:389–395
Stanley KO (2003) Learning concept drift with a Commitee of decision trees, Technical Report AI-TR-03-302, Dept. of Computer Sciences, Univ. of Texas, Austin
Street W, Kim Y (2001) A streaming ensemble algorithm (SEA) for large-scale classification, In: Proceedings of the 7th ACM International Conference on Knowledge Discovery and Data Mining, ACM Press, New York, pp 377–382
Schlimmer JC, Granger RH (1986) Incremental learning from noisy data. Mach Learn 1(3):317–354
Tsymbal A (2004) The problem of concept drift: definitions and related work, Technical Report TCD-CS-2004-15. Department of Computer Science, Trinity College Dublin, Ireland
Kubat M, Widmer G (1996) Learning in the presence of concept drift and hidden contexts, Machine Learning, 23 (1): 69–101.16.Klinkenberg R., Learning drifting
Kuncheva LI, Whitaker CJ (2003) Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach Learn 51:181–207
Tang EK, Sunganthan PN, Yao X (2006) An analysis of diversity measures. Mach Learn 65:247–271
Yule G (1900) On the association of attributes in statistics, Philosophical Trans. Royal Soc. of London, Series A, vol 194, pp 257–319
Gama J, Sebastião R, Rodrigues PP (2009) Issues in evaluation of stream learning algorithms, In KDD’09, pp 329–338
Minku FL, Yao X (2009) Using diversity to handle concept drift in on-line learning, In: Proceedings Int’l Joint Conf. Neural Networks (IJCNN, 2009b), pp 2125–2132
Su L, Liu HY, Song ZH (2011) A new classification algorithm for data stream. International Journal of Modern Education and Computer Science 4:32–39
Murphy PM (1998) UCI Repository of machine learning databases. Department of Information and Computer Sciences, University of California, Irvine, available at http://www.ics.uci.edu/~mlearn/
Blake C, Merz C (1998) UCI repository of machine learning databases. Department of Information and Computer Sciences, University of California, Irvine, Web site (Online). http://www.ics.uci.edu/~mlearn/MLRepository.html.
Tsai CJ, Lee CI, Yang WP (2009) Mining decision rules on data streams in the presence of concept drifts. Expert Syst Appl 36:1164–1178
Gaber MM, Yu PS (2006) Detection and classification of changes in evolving data streams. Int J Inf Technol Decis Mak 5:659–670
Yang Y, Wu X, Zhu X (2005) Combining proactive and reactive predictions for data streams, In Proceedings of ACM SIGKDD, pp 710–715
Wang H, Fan W, Yu PS, Han J (2001) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining, pp 226–235
Chu F, Zaniolo C (2004) Fast and light boosting for adaptive mining of data streams. In: Proceedings Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD’04), pp 282–292
Scholz M, Klinkenberg R (2007) Boosting classifiers for drifting concepts. Intell Data Anal Spec Issue Knowl Discov Data Streams 11(1):3–28
S. Ramamurthy, R. Bhatnagar, Tracking Recurrent Concept Drift in Streaming Data Using Ensemble Classifiers, In Proc. Int’l Conf. Machine Learning and Applications (ICMLA’07), pp. 404-409, 2007
Gao J, Fan W, Han J, Yu P (2007) A general framework for mining concept-drifting data streams with skewed distributions. In: Proceedings SIAM Int’l Conf. Data Mining (ICDM)
He H, Chen S (2008) IMORL: incremental Multiple-Object Recognition and Localization. IEEE Trans Neural Networks 19(10):1727–1738
Polikar R, Udpa L, Udpa SS, Honavar V (2001) Learn ++: an incremental learning algorithm for supervised neural networks. IEEE Trans Sys Man Cybernet Part C 31(4):497–508
Kasabov N (2003) Evolving connectionist systems. Springer, London
Asuncion A, Newman DJ (2007) UCI machine learning repository. Web site, Department of Information and Computer Sciences, University of California, Irvine, http://www.ics.uci.edu/~mlearn/MLRepository.html
Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, New York pp 71–80
Quinlan JR (1993) C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco
Dewan MF, Zhang L, Hossain A, Chowdhury MR, Rebecca S, Graham S, Keshav D (2013) An adaptive ensemble classifier for mining concept drifting data streams. Expert Sys Appl 40(15):5895–5906. doi:10.1016/j.eswa.05.001
Zliobaite I (2009) Learning under concept drift: an overview, Technical report faculty of mathematics and informatics. Vilnius UniversityLithuania, Vilnius
Tumer K, Ghosh J (1996) Error correlation and error reduction in ensemble classifiers. Connect Sci 8(3):385–404
Schlimmer J, Granger R (1986) Beyond incremental processing: tracking concept drift. In: Proceedings of the 5th National Conference on Artificial Intelligence, AAAI Press, Menlo Park, CA, pp 502–507
Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw 22:1517–1531
Bhardwaj M, Bhatnagar V (2014) Towards an optimally pruned classifier ensemble. Int J Mach Learn Cybernet. doi:10.1007/s13042-014-0303-8
Baumgartner D, Serpen G (2013) Performance of global–local hybrid ensemble versus boosting and bagging ensembles. Int J Mach Learn Cybernet 4(4):301–317
Christou IT, Gekas G, Kyrikou A (2012) A classifier ensemble approach to the TV-viewer profile adaptation problem. Int J Mach Learn Cybernet 3(4):313–326
Wang XZ, Wang R, Feng HM, Wang H (2014) A new approach to classifier fusion based on upper integral. IEEE Transactions on Cybernetics 44(5):620–635
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sidhu, P., Bhatia, M.P.S. A novel online ensemble approach to handle concept drifting data streams: diversified dynamic weighted majority. Int. J. Mach. Learn. & Cyber. 9, 37–61 (2018). https://doi.org/10.1007/s13042-015-0333-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-015-0333-x