Skip to main content

The Algorithm APT to Classify in Concurrence of Latency and Drift

  • Conference paper
Advances in Intelligent Data Analysis X (IDA 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7014))

Included in the following conference series:

Abstract

Population drift is a challenging problem in classification, and denotes changes in probability distributions over time. Known drift-adaptive classification methods such as incremental learning rely on current, labelled data for classification model updates, assuming that such labelled data are available without verification latency. However, verification latency is a relevant problem in some application domains, where predictions have to be made far into the future. This concurrence of drift and latency requires new approaches in machine learning. We propose a two-stage learning strategy: First, the nature of drift in temporal data needs to be identified. This requires the formulation of explicit drift models for the underlying data generating process. In a second step, these models are used to substitute scarce labelled data for updating classification models.

This paper contributes an explicit drift model, which is characterising a mixture of independently evolving sub-populations. In this model, the joint distribution is a mixture of arbitrarily distributed sub-populations drifting over time. An arbitrary sub-population tracker algorithm is presented, which can track and predict the distributions by the use of unlabelled data. Experimental evaluation shows that the presented APT algorithm is capable of tracking and predicting changes in the posterior distribution of class labels accurately.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.C.: On change diagnosis in evolving data streams. IEEE Transactions on Knowledge and Data Engineering 17(5), 587–600 (2005)

    Article  Google Scholar 

  2. Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the VLDB Conference (2003)

    Google Scholar 

  3. Böttcher, M., Höppner, F., Spiliopoulou, M.: On exploiting the power of time in data mining. ACM SIGKDD Explorations Newsletter 10(2), 3–11 (2008)

    Article  Google Scholar 

  4. Burkard, R.E., Dell’Amico, M., Martello, S.: Assignment Problems. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (2009)

    Book  MATH  Google Scholar 

  5. Dempster, A.P., Laird, N.M., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 39, 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  6. Duong, T., Hazelton, M.L.: Cross-validation bandwidth matrices for multivariate kernel density estimation. Scandinavian Journal of Statistics 32, 485–506 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  7. Eldershaw, C., Hegland, M.: Cluster analysis using triangulation. In: Noye, B.J., Teubner, M.D., Gill, A.W. (eds.) Proceedings of the Computational Techniques and Applications Conference (1997)

    Google Scholar 

  8. Estivill-Castro, V., Lee, I.: Autoclust: Automatic clustering via boundary extraction for mining massive point-data sets. In: Proceedings of the 5th International Conference on Geocomputation, pp. 23–25 (2000)

    Google Scholar 

  9. Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  10. Hand, D.J., Mannila, H., Smyth, P.: Principles of Data Mining. In: Adaptive Computation and Machine Learning. The MIT Press, Cambridge (2001)

    Google Scholar 

  11. Kogan, J.: Introduction to Clustering Large and High-Dimensional Data. Cambridge University Press, Cambridge (2007)

    MATH  Google Scholar 

  12. Kuhn, H.W.: The hungarian method for the assignment problem. Naval Research Logistics Quarterly 2, 83–97 (1955)

    Article  MathSciNet  MATH  Google Scholar 

  13. Lawler, E.: Combinatorial Optimization: Networks and Matroids. Dover Publications, New York (1976)

    MATH  Google Scholar 

  14. Liu, D., Nosovskiy, G.V., Sourina, O.: Effective clustering and boundary detection algorithm based on delaunay triangulation. Pattern Recognition Letters 29, 1261–1273 (2008)

    Article  MATH  Google Scholar 

  15. Marrs, G., Hickey, R., Black, M.: The impact of latency on online classification learning with concept drift. In: Bi, Y., Williams, M.-A. (eds.) KSEM 2010. LNCS, vol. 6291, pp. 459–469. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  16. Munkres, J.: Algorithms for the assignment and transportation problems. Journal of the Society for Industrial and Applied Mathematics (SIAM) 5, 32–38 (1957)

    Article  MathSciNet  MATH  Google Scholar 

  17. Parzen, E.: On estimation of a probability density function and mode. Annals of Mathematical Statistics 33, 1065–1076 (1962)

    Article  MathSciNet  MATH  Google Scholar 

  18. Tsymbal, A.: The problem of concept drift: definitions and related work. Technical report, Department of Computer Science, Trinity College Dublin (2004)

    Google Scholar 

  19. Wand, M.P., Jones, M.C.: Kernel Smoothing. Chapman and Hall, Boca Raton (1995)

    Book  MATH  Google Scholar 

  20. Zhu, X.: Semi-supervised learning literature survey. Technical Report 1530, University of Wisconsin (2005)

    Google Scholar 

  21. ZliobaitÄ—, I.: Learning under concept drift: an overview. Technical report, Vilnius University (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Krempl, G. (2011). The Algorithm APT to Classify in Concurrence of Latency and Drift. In: Gama, J., Bradley, E., Hollmén, J. (eds) Advances in Intelligent Data Analysis X. IDA 2011. Lecture Notes in Computer Science, vol 7014. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24800-9_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24800-9_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24799-6

  • Online ISBN: 978-3-642-24800-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics