Skip to main content

Prequential AUC for Classifier Evaluation and Drift Detection in Evolving Data Streams

  • Conference paper
  • First Online:
New Frontiers in Mining Complex Patterns (NFMCP 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8983))

Included in the following conference series:

Abstract

Detecting and adapting to concept drifts make learning data stream classifiers a difficult task. It becomes even more complex when the distribution of classes in the stream is imbalanced. Currently, proper assessment of classifiers for such data is still a challenge, as existing evaluation measures either do not take into account class imbalance or are unable to indicate class ratio changes in time. In this paper, we advocate the use of the area under the ROC curve (AUC) in imbalanced data stream settings and propose an efficient incremental algorithm that uses a sorted tree structure with a sliding window to compute AUC using constant time and memory. Additionally, we experimentally verify that this algorithm is capable of correctly evaluating classifiers on imbalanced streams and can be used as a basis for detecting changes in class definitions and imbalance ratio.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Source code, test scripts, generator parameters, and links to datasets available at:http://www.cs.put.poznan.pl/dbrzezinski/software.php.

References

  1. Krempl, G., Zliobaite, I., Brzezinski, D., Hüllermeier, E., Last, M., Lemaire, V., Noack, T., Shaker, A., Sievi, S., Spiliopoulou, M., Stefanowski, J.: Open challenges for data stream mining research. SIGKDD Explor. 16(1), 1–10 (2014)

    Article  Google Scholar 

  2. Batista, G., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newslett. 6(1), 20–29 (2004)

    Article  Google Scholar 

  3. Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)

    MATH  Google Scholar 

  4. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  5. He, H., Ma, Y. (eds.): Imbalanced Learning: Foundations, Algorithms, and Applications. Wiley-IEEE Press, Hoboken (2013)

    Google Scholar 

  6. Ditzler, G., Polikar, R.: Incremental learning of concept drift from streaming imbalanced data. IEEE Trans. Knowl. Data Eng. 25(10), 2283–2301 (2013)

    Article  Google Scholar 

  7. Hoens, T.R., Chawla, N.V.: Learning in non-stationary environments with class imbalance. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery Data Mining, pp. 168–176, ACM (2012)

    Google Scholar 

  8. Lichtenwalter, R.N., Chawla, N.V.: Adaptive methods for classification in arbitrarily imbalanced and drifting data streams. In: Theeramunkong, T., Nattee, C., Adeodato, P.J.L., Chawla, N., Christen, P., Lenca, P., Poon, J., Williams, G. (eds.) PAKDD Workshops 2009. LNCS, vol. 5669, pp. 53–75. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  9. Wang, B., Pineau, J.: Online ensemble learning for imbalanced data streams. CoRR abs/1310.8004 (2013)

    Google Scholar 

  10. Gama, J., Sebastião, R., Rodrigues, P.P.: On evaluating stream learning algorithms. Mach. Learn. 90(3), 317–346 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  11. Bifet, A., Frank, E.: Sentiment knowledge discovery in twitter streaming data. In: Pfahringer, B., Holmes, G., Hoffmann, A. (eds.) DS 2010. LNCS, vol. 6332, pp. 1–15. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  12. Zliobaite, I., Bifet, A., Read, J., Pfahringer, B., Holmes, G.: Evaluation methods and decision theory for classification of streaming data with temporal dependence. Mach. Learn. 98, 455–482 (2015). doi:10.1007/s10994-014-5441-4

    Article  MATH  MathSciNet  Google Scholar 

  13. Wu, S., Flach, P.A., Ferri, C.: An improved model selection heuristic for AUC. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 478–489. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  14. Huang, J., Ling, C.X.: Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 17(3), 299–310 (2005)

    Article  Google Scholar 

  15. Gama, J.: Knowledge Discovery from Data Streams. Chapman and Hall, Boca Raton (2010)

    Book  MATH  Google Scholar 

  16. Bouckaert, R.R.: Efficient AUC learning curve calculation. In: Sattar, A., Kang, B.-H. (eds.) AI 2006. LNCS (LNAI), vol. 4304, pp. 181–191. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  17. Provost, F.J., Domingos, P.: Tree induction for probability-based ranking. Mach. Learn. 52(3), 199–215 (2003)

    Article  MATH  Google Scholar 

  18. Fawcett, T.: Using rule sets to maximize ROC performance. In: Proceedings of the 2001 IEEE International Conference on Data Mining, pp. 131–138 (2001)

    Google Scholar 

  19. Bayer, R.: Symmetric binary b-trees: data structure and maintenance algorithms. Acta Inf. 1, 290–306 (1972)

    Article  MATH  MathSciNet  Google Scholar 

  20. Brzezinski, D., Stefanowski, J.: Combining block-based and online methods in learning ensembles from concept drifting data streams. Inf. Sci. 265, 50–67 (2014)

    Article  MathSciNet  Google Scholar 

  21. Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)

    Google Scholar 

  22. Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.B.: PAKDD data mining competition (2009)

    Google Scholar 

  23. Street, W.N., Kim, Y.: A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery Data Mining, pp. 377–382 (2001)

    Google Scholar 

  24. Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery Data Mining, pp. 226–235 (2003)

    Google Scholar 

  25. Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MATH  MathSciNet  Google Scholar 

  26. Japkowicz, N., Shah, M.: Evaluating Learning Algorithms: A Classification Perspective. Cambridge University Press, New York (2011)

    Book  Google Scholar 

Download references

Acknowledgments

The authors’ research was funded by the Polish National Science Center under Grant No. DEC-2013/11/B/ST6/00963.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dariusz Brzezinski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Brzezinski, D., Stefanowski, J. (2015). Prequential AUC for Classifier Evaluation and Drift Detection in Evolving Data Streams. In: Appice, A., Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z. (eds) New Frontiers in Mining Complex Patterns. NFMCP 2014. Lecture Notes in Computer Science(), vol 8983. Springer, Cham. https://doi.org/10.1007/978-3-319-17876-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-17876-9_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-17875-2

  • Online ISBN: 978-3-319-17876-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics