Skip to main content

Efficiently Clustering Probabilistic Data Streams

  • Conference paper
Advances in Data and Web Management (APWeb 2009, WAIM 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5446))

  • 1169 Accesses

Abstract

Data mining on uncertain data stream has attracted a lot of attentions because of the widely existed imprecise data generated from a variety of streaming applications in recent years. The main challenge of mining uncertain data streams stems from the strict space and time requirements of processing arriving tuples in high-speed. When new tuples arrive, the number of the possible world instances will increase exponentially related to the volume of the data stream. As one of the most important mining task, how to devise clustering algorithms has been studied intensively on deterministic data streams, whereas the work on the uncertain data streams still remains rare. This paper proposes a novel solution for clustering on uncertain data streams in point probability model, where the existence of each tuple is uncertain. Detailed analysis and the thorough experimental reports both on synthetic and real data sets illustrate the advantages of our new method in terms of effectiveness and efficiency.

This work is supported by Shanghai Leading Academic Discipline Project (Project Number: B412) and National Natural Science Foundation of China (NSFC) under grant No. 60803020.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. MacQueen, J.B.: Some Methods for classification and Analysis of Multivariate Observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, vol. 1, pp. 281–297. University of California Press,

    Google Scholar 

  2. Aggarwal, C.C., Yu, P.S.: A Framework for Clustering Uncertain Data Streams. In: Proc. of ICDE (2008)

    Google Scholar 

  3. OCallaghan, L., Meyerson, A., Motwani, R., Mishra, N., Guha, S.: Streaming-Data Algorithms for High-Quality Clustering. In: Proc. of ICDE (2002)

    Google Scholar 

  4. Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A Framework for Clustering Evolving Data Streams. In: Proc. of VLDB (2003)

    Google Scholar 

  5. Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: An Efficient Data Clustering Method for Very Large Databases. In: Proc. of SIGMOD (1996)

    Google Scholar 

  6. Zhou, A., Cao, F., Qian, W., Jin, C.: Tracking clusters in evolving data streams over sliding windows. Knowledge and Information System Journal (KAIS) (2007)

    Google Scholar 

  7. Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A Framework for Projected Clustering of High Dimensional Data Streams. In: Proc. of VLDB (2004)

    Google Scholar 

  8. Tasoulis, D.K., Adams, N.M., Hand, D.J.: Unsupervised Clustering In Streaming Data. In: Proc. of ICDM (2006)

    Google Scholar 

  9. Kriegel, H.-P., Pfeifle, M.: Density-Based Clustering of Uncertain Data. In: Proc. of KDD (2005)

    Google Scholar 

  10. Kriegel, H.-P., Pfeifle, M.: Hierarchical Density-Based Clustering of Uncertain Data. In: Proc. of ICDM (2005)

    Google Scholar 

  11. Ngai, W.K., Kao, B., Chui, C.K., Cheng, R., Chau, M., Yip, K.Y.: Efficient Clustering of Uncertain Data. In: Proc. of ICDM (2006)

    Google Scholar 

  12. Cormode, G., Garofalakis, M.N.: Sketching probabilistic data streams. In: Proc. of SIGMOD (2007)

    Google Scholar 

  13. Jayram, T.S., McGregor, A., Muthukrishnan, S., Vee, E.: Estimating statistical aggregates on probabilistic data streams. In: Proc. of PODS (2007)

    Google Scholar 

  14. Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases, http://www.ics.uci.edu/~mlearn/MLRepository.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, C., Jin, C., Zhou, A. (2009). Efficiently Clustering Probabilistic Data Streams. In: Li, Q., Feng, L., Pei, J., Wang, S.X., Zhou, X., Zhu, QM. (eds) Advances in Data and Web Management. APWeb WAIM 2009 2009. Lecture Notes in Computer Science, vol 5446. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00672-2_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00672-2_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00671-5

  • Online ISBN: 978-3-642-00672-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics