The Algorithm APT to Classify in Concurrence of Latency and Drift

Krempl, Georg

doi:10.1007/978-3-642-24800-9_22

Georg Krempl¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7014))

Included in the following conference series:

International Symposium on Intelligent Data Analysis

1436 Accesses
12 Citations

Abstract

Population drift is a challenging problem in classification, and denotes changes in probability distributions over time. Known drift-adaptive classification methods such as incremental learning rely on current, labelled data for classification model updates, assuming that such labelled data are available without verification latency. However, verification latency is a relevant problem in some application domains, where predictions have to be made far into the future. This concurrence of drift and latency requires new approaches in machine learning. We propose a two-stage learning strategy: First, the nature of drift in temporal data needs to be identified. This requires the formulation of explicit drift models for the underlying data generating process. In a second step, these models are used to substitute scarce labelled data for updating classification models.

This paper contributes an explicit drift model, which is characterising a mixture of independently evolving sub-populations. In this model, the joint distribution is a mixture of arbitrarily distributed sub-populations drifting over time. An arbitrary sub-population tracker algorithm is presented, which can track and predict the distributions by the use of unlabelled data. Experimental evaluation shows that the presented APT algorithm is capable of tracking and predicting changes in the posterior distribution of class labels accurately.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, C.C.: On change diagnosis in evolving data streams. IEEE Transactions on Knowledge and Data Engineering 17(5), 587–600 (2005)
Article Google Scholar
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the VLDB Conference (2003)
Google Scholar
Böttcher, M., Höppner, F., Spiliopoulou, M.: On exploiting the power of time in data mining. ACM SIGKDD Explorations Newsletter 10(2), 3–11 (2008)
Article Google Scholar
Burkard, R.E., Dell’Amico, M., Martello, S.: Assignment Problems. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (2009)
Book MATH Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 39, 1–38 (1977)
MathSciNet MATH Google Scholar
Duong, T., Hazelton, M.L.: Cross-validation bandwidth matrices for multivariate kernel density estimation. Scandinavian Journal of Statistics 32, 485–506 (2005)
Article MathSciNet MATH Google Scholar
Eldershaw, C., Hegland, M.: Cluster analysis using triangulation. In: Noye, B.J., Teubner, M.D., Gill, A.W. (eds.) Proceedings of the Computational Techniques and Applications Conference (1997)
Google Scholar
Estivill-Castro, V., Lee, I.: Autoclust: Automatic clustering via boundary extraction for mining massive point-data sets. In: Proceedings of the 5th International Conference on Geocomputation, pp. 23–25 (2000)
Google Scholar
Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)
Article MathSciNet MATH Google Scholar
Hand, D.J., Mannila, H., Smyth, P.: Principles of Data Mining. In: Adaptive Computation and Machine Learning. The MIT Press, Cambridge (2001)
Google Scholar
Kogan, J.: Introduction to Clustering Large and High-Dimensional Data. Cambridge University Press, Cambridge (2007)
MATH Google Scholar
Kuhn, H.W.: The hungarian method for the assignment problem. Naval Research Logistics Quarterly 2, 83–97 (1955)
Article MathSciNet MATH Google Scholar
Lawler, E.: Combinatorial Optimization: Networks and Matroids. Dover Publications, New York (1976)
MATH Google Scholar
Liu, D., Nosovskiy, G.V., Sourina, O.: Effective clustering and boundary detection algorithm based on delaunay triangulation. Pattern Recognition Letters 29, 1261–1273 (2008)
Article MATH Google Scholar
Marrs, G., Hickey, R., Black, M.: The impact of latency on online classification learning with concept drift. In: Bi, Y., Williams, M.-A. (eds.) KSEM 2010. LNCS, vol. 6291, pp. 459–469. Springer, Heidelberg (2010)
Chapter Google Scholar
Munkres, J.: Algorithms for the assignment and transportation problems. Journal of the Society for Industrial and Applied Mathematics (SIAM) 5, 32–38 (1957)
Article MathSciNet MATH Google Scholar
Parzen, E.: On estimation of a probability density function and mode. Annals of Mathematical Statistics 33, 1065–1076 (1962)
Article MathSciNet MATH Google Scholar
Tsymbal, A.: The problem of concept drift: definitions and related work. Technical report, Department of Computer Science, Trinity College Dublin (2004)
Google Scholar
Wand, M.P., Jones, M.C.: Kernel Smoothing. Chapman and Hall, Boca Raton (1995)
Book MATH Google Scholar
Zhu, X.: Semi-supervised learning literature survey. Technical Report 1530, University of Wisconsin (2005)
Google Scholar
Zliobaitė, I.: Learning under concept drift: an overview. Technical report, Vilnius University (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics and Operations Research, University of Graz, Universitätsstraße 15/E3, 8010, Graz, Austria
Georg Krempl

Authors

Georg Krempl
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Economics, LIAAD-INESC Porto, L.A., University of Porto, Rua de Ceuta, 118, 6, 4050-190, Porto, Portugal
João Gama
Department of Computer Science, University of Colorado, 80309-0430, Boulder, CO, USA
Elizabeth Bradley
Department of Information and Computer Science, Aalto University School of Science, P.O. Box 15400, 00076, Aalto, Finland
Jaakko Hollmén

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Krempl, G. (2011). The Algorithm APT to Classify in Concurrence of Latency and Drift. In: Gama, J., Bradley, E., Hollmén, J. (eds) Advances in Intelligent Data Analysis X. IDA 2011. Lecture Notes in Computer Science, vol 7014. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24800-9_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-24800-9_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24799-6
Online ISBN: 978-3-642-24800-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics