Skip to main content

Online Clustering for Evolving Data Streams with Online Anomaly Detection

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10938))

Included in the following conference series:

Abstract

Clustering data streams is an emerging challenge with a wide range of applications in areas including Wireless Sensor Networks, the Internet of Things, finance and social media. In an evolving data stream, a clustering algorithm is desired to both (a) assign observations to clusters and (b) identify anomalies in real-time. Current state-of-the-art algorithms in the literature do not address feature (b) as they only consider the spatial proximity of data, which results in (1) poor clustering and (2) poor demonstration of the temporal evolution of data in noisy environments. In this paper, we propose an online clustering algorithm that considers the temporal proximity of observations as well as their spatial proximity to identify anomalies in real-time. It identifies the evolution of clusters in noisy streams, incrementally updates the model and calculates the minimum window length over the evolving data stream without jeopardizing performance. To the best of our knowledge, this is the first online clustering algorithm that identifies anomalies in real-time and discovers the temporal evolution of clusters. Our contributions are supported by synthetic as well as real-world data experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://goo.gl/zcAijP.

  2. 2.

    https://www.kaggle.com/START-UMD/gtd.

References

  1. Guha, S., et al.: Clustering data streams. In: Data Stream Management, pp. 169–187 (2000)

    Google Scholar 

  2. Moshtaghi, M., et al.: Streaming analysis in wireless sensor networks. Wirel. Commun. Mobile Comput. 14(9), 905–921 (2014)

    Article  Google Scholar 

  3. Silva, J., et al.: Data stream clustering: a survey. ACM CSUR 46(1), 13 (2013)

    MathSciNet  MATH  Google Scholar 

  4. Kranen, P., et al.: The clustree: indexing micro-clusters for anytime stream mining. Knowl. Inf. Syst. 29(2), 249–272 (2011)

    Article  Google Scholar 

  5. Cao, F., et al.: Density-based clustering over an evolving data stream with noise. In: SIAM International Conference on Data Mining, pp. 328–339 (2006)

    Chapter  Google Scholar 

  6. Carpenter, G.A., et al.: Art 2-a: an adaptive resonance algorithm for rapid category learning and recognition. In: IEEE International Joint Conference on Neural Networks, pp. 151–156 (1991)

    Google Scholar 

  7. MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, vol. 1(14), pp. 281–297 (1967)

    Google Scholar 

  8. Krogh, A., Hertz, J.A.: A simple weight decay can improve generalization. In: Advances in Neural Information Processing Systems, pp. 950–957 (1992)

    Google Scholar 

  9. Angelov, P.: Evolving takagi-sugeno fuzzy systems from streaming data. In: Evolving Intelligent Systems: Methodology and Applications, vol. 12, p. 21 (2010)

    Chapter  Google Scholar 

  10. Kohonen, T.: The self-organizing map. Neurocomputing 21(1), 1–6 (1998)

    Article  MathSciNet  Google Scholar 

  11. Charikar, M., et al.: Incremental clustering and dynamic information retrieval. SIAM J. Comput. 33(6), 1417–1440 (2004)

    Article  MathSciNet  Google Scholar 

  12. Feldman, J.A., Ballard, D.H.: Connectionist models and their properties. Cogn. Sci. 6(3), 205–254 (1982)

    Article  Google Scholar 

  13. Moshtaghi, M., et al.: Online clustering of multivariate time-series. In: SIAM International Conference on Data Mining, pp. 360–368 (2016)

    Google Scholar 

  14. Rajasegarar, S., et al.: Elliptical anomalies in wireless sensor networks. ACM Trans. Sensor Netw. 6(1), 7 (2009)

    Article  Google Scholar 

  15. Moshtaghi, M., et al.: Evolving fuzzy rules for anomaly detection in data streams. IEEE Trans. Fuzzy Syst. 23(3), 688–700 (2015)

    Article  Google Scholar 

  16. Härdle, W., Simar, L.: Applied Multivariate Statistical Analysis. Springer, Heidelberg (2007)

    MATH  Google Scholar 

  17. Ester, M., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD 1996, vol. 34, pp. 226–231 (1996)

    Google Scholar 

  18. Bielecki, A., Wójcik, M.: Hybrid system of ART and RBF neural networks for online clustering. Appl. Soft Comput. 58, 1–10 (2017)

    Article  Google Scholar 

  19. Lei, Y., et al.: Generalized information theoretic cluster validity indices for soft clusterings. In: IEEE Symposium on CIDM, pp. 24–31 (2014)

    Google Scholar 

  20. Salehi, M., Leckie, C.A., Moshtaghi, M., Vaithianathan, T.: A relevance weighted ensemble model for anomaly detection in switching data streams. In: Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P., Kao, H.-Y. (eds.) PAKDD 2014. LNCS (LNAI), vol. 8444, pp. 461–473. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06605-9_38

    Chapter  Google Scholar 

  21. Chenaghlou, M., et al.: An efficient method for anomaly detection in nonstationary environments. In: IEEE Globecom (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Milad Chenaghlou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chenaghlou, M., Moshtaghi, M., Leckie, C., Salehi, M. (2018). Online Clustering for Evolving Data Streams with Online Anomaly Detection. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10938. Springer, Cham. https://doi.org/10.1007/978-3-319-93037-4_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-93037-4_40

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-93036-7

  • Online ISBN: 978-3-319-93037-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics