Efficiently Clustering Probabilistic Data Streams

Zhang, Chen; Jin, Cheqing; Zhou, Aoying

doi:10.1007/978-3-642-00672-2_25

Chen Zhang²²,
Cheqing Jin^23,24 &
Aoying Zhou^23,24

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5446))

Included in the following conference series:

1169 Accesses

Abstract

Data mining on uncertain data stream has attracted a lot of attentions because of the widely existed imprecise data generated from a variety of streaming applications in recent years. The main challenge of mining uncertain data streams stems from the strict space and time requirements of processing arriving tuples in high-speed. When new tuples arrive, the number of the possible world instances will increase exponentially related to the volume of the data stream. As one of the most important mining task, how to devise clustering algorithms has been studied intensively on deterministic data streams, whereas the work on the uncertain data streams still remains rare. This paper proposes a novel solution for clustering on uncertain data streams in point probability model, where the existence of each tuple is uncertain. Detailed analysis and the thorough experimental reports both on synthetic and real data sets illustrate the advantages of our new method in terms of effectiveness and efficiency.

This work is supported by Shanghai Leading Academic Discipline Project (Project Number: B412) and National Natural Science Foundation of China (NSFC) under grant No. 60803020.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

MacQueen, J.B.: Some Methods for classification and Analysis of Multivariate Observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, vol. 1, pp. 281–297. University of California Press,
Google Scholar
Aggarwal, C.C., Yu, P.S.: A Framework for Clustering Uncertain Data Streams. In: Proc. of ICDE (2008)
Google Scholar
OCallaghan, L., Meyerson, A., Motwani, R., Mishra, N., Guha, S.: Streaming-Data Algorithms for High-Quality Clustering. In: Proc. of ICDE (2002)
Google Scholar
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A Framework for Clustering Evolving Data Streams. In: Proc. of VLDB (2003)
Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: An Efficient Data Clustering Method for Very Large Databases. In: Proc. of SIGMOD (1996)
Google Scholar
Zhou, A., Cao, F., Qian, W., Jin, C.: Tracking clusters in evolving data streams over sliding windows. Knowledge and Information System Journal (KAIS) (2007)
Google Scholar
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A Framework for Projected Clustering of High Dimensional Data Streams. In: Proc. of VLDB (2004)
Google Scholar
Tasoulis, D.K., Adams, N.M., Hand, D.J.: Unsupervised Clustering In Streaming Data. In: Proc. of ICDM (2006)
Google Scholar
Kriegel, H.-P., Pfeifle, M.: Density-Based Clustering of Uncertain Data. In: Proc. of KDD (2005)
Google Scholar
Kriegel, H.-P., Pfeifle, M.: Hierarchical Density-Based Clustering of Uncertain Data. In: Proc. of ICDM (2005)
Google Scholar
Ngai, W.K., Kao, B., Chui, C.K., Cheng, R., Chau, M., Yip, K.Y.: Efficient Clustering of Uncertain Data. In: Proc. of ICDM (2006)
Google Scholar
Cormode, G., Garofalakis, M.N.: Sketching probabilistic data streams. In: Proc. of SIGMOD (2007)
Google Scholar
Jayram, T.S., McGregor, A., Muthukrishnan, S., Vee, E.: Estimating statistical aggregates on probabilistic data streams. In: Proc. of PODS (2007)
Google Scholar
Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases, http://www.ics.uci.edu/~mlearn/MLRepository.html

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Fudan University, P.R. China
Chen Zhang
Software Engineering Institute of East China Normal University, P.R. China
Cheqing Jin & Aoying Zhou
Shanghai Key Laboratory of Trustworthy Computering, P.R. China
Cheqing Jin & Aoying Zhou

Authors

Chen Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Cheqing Jin
View author publications
You can also search for this author in PubMed Google Scholar
Aoying Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong, China
Qing Li
Department of Computer Science & Technology, Tsinghua University, Beijing, China
Ling Feng
School of Computing Science, Simon Fraser University, 8888 University Drive, V5A 1S6, Burnaby BC, Canada
Jian Pei
Department of Computer Science, University of Vermont, VT 05405, Burlington, USA
Sean X. Wang
School of Information Technology and Electrical Engineering, The University of Queensland, QLD 4072, Brisbane, Australia
Xiaofang Zhou
Jiangsu Provincial Key Lab of Computer Information Processing Technology School of Computer Science & Technology, Soochow University China, 1 shizi Street Suzhou, 215006, Jiangsu, China
Qiao-Ming Zhu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, C., Jin, C., Zhou, A. (2009). Efficiently Clustering Probabilistic Data Streams. In: Li, Q., Feng, L., Pei, J., Wang, S.X., Zhou, X., Zhu, QM. (eds) Advances in Data and Web Management. APWeb WAIM 2009 2009. Lecture Notes in Computer Science, vol 5446. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00672-2_25

Download citation

DOI: https://doi.org/10.1007/978-3-642-00672-2_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00671-5
Online ISBN: 978-3-642-00672-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics