Abstract
Data stream classification poses many challenges, most of which are not addressed by the state-of-the-art. We present DXMiner, which addresses four major challenges to data stream classification, namely, infinite length, concept-drift, concept-evolution, and feature-evolution. Data streams are assumed to be infinite in length, which necessitates single-pass incremental learning techniques. Concept-drift occurs in a data stream when the underlying concept changes over time. Most existing data stream classification techniques address only the infinite length and concept-drift problems. However, concept-evolution and feature- evolution are also major challenges, and these are ignored by most of the existing approaches. Concept-evolution occurs in the stream when novel classes arrive, and feature-evolution occurs when new features emerge in the stream. Our previous work addresses the concept-evolution problem in addition to addressing the infinite length and concept-drift problems. Most of the existing data stream classification techniques, including our previous work, assume that the feature space of the data points in the stream is static. This assumption may be impractical for some type of data, for example text data. DXMiner considers the dynamic nature of the feature space and provides an elegant solution for classification and novel class detection when the feature space is dynamic. We show that our approach outperforms state-of-the-art stream classification techniques in classifying and detecting novel classes in real data streams.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Chen, S., Wang, H., Zhou, S., Yu, P.: Stop chasing trends: Discovering high order models in evolving data. In: Proc. ICDE 2008, pp. 923–932 (2008)
Fan, W.: Systematic data selection to mine concept-drifting data streams. In: Proc. ACM SIGKDD, Seattle, WA, USA, pp. 128–137 (2004)
Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: SIGKDD, San Francisco, CA, USA, pp. 97–106 (August 2001)
Katakis, I., Tsoumakas, G., Vlahavas, I.: Dynamic feature space and incremental feature selection for the classification of textual data streams. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 102–116. Springer, Heidelberg (2006)
Kolter, J., Maloof, M.: Using additive expert ensembles to cope with concept drift. In: ICML, Bonn, Germany, pp. 449–456 (August 2005)
Masud, M.M., Gao, J., Khan, L., Han, J., Thuraisingham, B.M.: Integrating novel class detection with classification for concept-drifting data streams. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009. LNCS, vol. 5782, pp. 79–94. Springer, Heidelberg (2009); Extended version is in the preprints, IEEE TKDE, vol. 99 (2010), doi = http://doi.ieeecomputersociety.org/10.1109/TKDE.2010.61
Masud, M.M., Gao, J., Khan, L., Han, J., Thuraisingham, B.M.: A practical approach to classify evolving data streams: Training with limited amount of labeled data. In: Perner, P. (ed.) ICDM 2008. LNCS (LNAI), vol. 5077, pp. 929–934. Springer, Heidelberg (2008)
Spinosa, E.J., de Leon, A.P., de Carvalho, F., Gama, J.: Cluster-based novel concept detection in data streams applied to intrusion detection in computer networks. In: ACM SAC, pp. 976–980 (2008)
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: KDD 2003, pp. 226–235 (2003)
Wenerstrom, B., Giraud-Carrier, C.: Temporal data mining in dynamic feature spaces. In: Perner, P. (ed.) ICDM 2006. LNCS (LNAI), vol. 4065, pp. 1141–1145. Springer, Heidelberg (2006)
Yang, Y., Wu, X., Zhu, X.: Combining proactive and reactive predictions for data streams. In: Proc. SIGKDD, pp. 710–715 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Masud, M.M., Chen, Q., Gao, J., Khan, L., Han, J., Thuraisingham, B. (2010). Classification and Novel Class Detection of Data Streams in a Dynamic Feature Space. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2010. Lecture Notes in Computer Science(), vol 6322. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15883-4_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-15883-4_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15882-7
Online ISBN: 978-3-642-15883-4
eBook Packages: Computer ScienceComputer Science (R0)