Skip to main content

Star-Scan: A Stable Clustering by Statistically Finding Centers and Noises

  • Conference paper
  • First Online:
Web Technologies and Applications (APWeb 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9931))

Included in the following conference series:

Abstract

In this paper, we present a new clustering algorithm, called A \(\mathbf{Sta }\)ble Cluste\(\mathbf{r }\)ing by \(\mathbf{S }\)tatistically Finding \(\mathbf{C }\)enters \(\mathbf{a }\)nd \(\mathbf{N }\)oises (Star-Scan). Star-Scan is a density-based clustering algorithm that can find arbitrary shape clusters and resists to the noise in a dataset. It borrows the idea from Rodriguez’s Clustering by Fast Search and Find of Density Peaks (CFSFDP) that the cluster centers are characterized by the points with both higher density and farther distance to other centers than their neighbors. Different from CFSFDP, instead of manual operation, Star-Scan uses a statistical method, box plot, to select cluster centers automatically. Furthermore, due to inadequate selection of cluster centers in CFSFDP, we apply a merging post-process to the produced clusters to get stable and correct results. Finally, we also use box plot to filter out noises on each of final clusters to solve the problem of over-filtering in CFSFDP. We have demonstrated the good performance of Star-Scan algorithm on several synthetic datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Inuk, J., Jong, C.P., Sun, K.: piClust: a density based piRNA clustering algorithm. J. Comput. Biol. Chem. 50, 60–67 (2014)

    Article  Google Scholar 

  2. Amineh, A., Ying, W.T., Hadi, S.: On density-based data streams clustering algorithms: a survey. J. Comput. Sci. Technol. 29(1), 116–141 (2014)

    Article  Google Scholar 

  3. Xianchao, Z., Han, L., Xiaotong, Z., Xinyue, L.: Novel density-based clustering algorithms for uncertain data. In: The Twenty-Eighth AAAI Conference on Artificial Intelligence, pp. 2191–2197. AAAI Press (2014)

    Google Scholar 

  4. Levi, L., Jörg, S.: Semi-supervised density-based clustering. In: The Ninth IEEE International Conference on Data Mining, pp. 842–847. IEEE Computer Society (2009)

    Google Scholar 

  5. Son, T., Xiao, H., Nina, H., Claudia, P., Christian, B.: Active density-based clustering. In: IEEE 13th International Conference on Data Mining, pp. 508–517. IEEE Computer Society (2013)

    Google Scholar 

  6. Michael, B.E., Paul, T.S., Patrick, O.B., David, B.: Cluster analysis and display of genome-wide expression patterns. In: National Academy of Sciences of the United States of America (PNAS), pp. 14863–14868. HighWire Press (1998)

    Google Scholar 

  7. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. CSUR 31, 264–323 (1999)

    Article  Google Scholar 

  8. Jiawei, H., Micheline, K.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2000)

    MATH  Google Scholar 

  9. Hans-Peter, K., Peer, K., Jörg, S., Arthur, Z.: Density-based clustering. WIREs Data Min. Knowl. Discov. 1, 231–240 (2011)

    Article  Google Scholar 

  10. Mihael, A., Markus, M.B., Hans-Peter, K., Jörg, S.: OPTICS: ordering points to identify the clustering structure. In: ACM SIGMOD International Conference on Management of Data, pp. 49–60. ACM Press, Philadelphia (1999)

    Google Scholar 

  11. Alexander, H., Daniel, K.: An efficient approach to clustering in large multimedia databases with noise. In: The Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98), pp. 58–65. AAAI Press, New York (1998)

    Google Scholar 

  12. Hinneburg, A., Gabriel, H.-H.: DENCLUE 2.0: fast clustering based on kernel density estimation. In: Berthold, M., Shawe-Taylor, J., Lavrač, N. (eds.) IDA 2007. LNCS, vol. 4723, pp. 70–80. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  13. Martin, E., Hans-Peter, K., Jörg, S., Xiaowei, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: The Second International Conference on Knowledge Discovery and Data Mining (KDD-1996), pp. 226–231. AAAI Press, Portland, Oregon, USA (1996)

    Google Scholar 

  14. Alex, R., Alessabdro, L.: Clustering by fast search and find of density peaks. Science 344, 1492–1496 (2014)

    Article  Google Scholar 

  15. Robert, D.M., Douglas, A.L., William, G.M.: Statistics: An Introduction. Duxbury Press, London (1994)

    Google Scholar 

  16. Junhao, G., Yufei, T.: DBSCAN revisited: mis-claim, un-fixability, and approximation. In: ACM SIGMOD International Conference on Management of Data (SIGMOD 2015), pp. 519–530. ACM Press, Melbourne, Victoria, Australia (2015)

    Google Scholar 

  17. Clustering Datasets. http://cs.joensuu.fi/sipu/datasets/

  18. Chameleon Datasets. http://glaros.dtc.umn.edu/gkhome/cluto/cluto/download

Download references

Acknowledgments

This work was supported by National Natural Science Foundation of China under Grant No. 60773216, No. 60773217.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qing Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Yang, N., Liu, Q., Li, Y., Xiao, L., Liu, X. (2016). Star-Scan: A Stable Clustering by Statistically Finding Centers and Noises. In: Li, F., Shim, K., Zheng, K., Liu, G. (eds) Web Technologies and Applications. APWeb 2016. Lecture Notes in Computer Science(), vol 9931. Springer, Cham. https://doi.org/10.1007/978-3-319-45814-4_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45814-4_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45813-7

  • Online ISBN: 978-3-319-45814-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics