Abstract
The semi-supervised, density-based clustering algorithm SSDBSCAN extracts clusters of a given dataset from different density levels by using a small set of labeled objects. A critical assumption of SSDBSCAN is, however, that at least one labeled object for each natural cluster in the dataset is provided. This assumption may be unrealistic when only a very few labeled objects can be provided, for instance due to the cost associated with determining the class label of an object. In this paper, we introduce a novel active learning strategy to select “most representative” objects whose class label should be determined as input for SSDBSCAN. By incorporating a Laplacian Graph Regularizer into a Local Linear Reconstruction method, our proposed algorithm selects objects that can represent the whole data space well. Experiments on synthetic and real datasets show that using the proposed active learning strategy, SSDBSCAN is able to extract more meaningful clusters even when only very few labeled objects are provided.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. ACM SIGKDD, pp. 226–231 (1996)
Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: OPTICS: Ordering points to identify the clustering structure. In: Proc. ACM SIGMOD, pp. 49–60 (1999)
Lelis, L., Sander, J.: Semi-supervised density-based clustering. In: Proc. IEEE ICDM, pp. 842–847 (2009)
Settles, B.: Active learning literature survey. University of Wisconsin, Madison (2010)
Zhang, L., Chen, C., Bu, J., Cai, D., He, X., Huang, T.: Active learning based on locally linear reconstruction. IEEE TPAMI 33(10), 2026–2038 (2011)
Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S.: Constrained k-means clustering with background knowledge. In: Proc. ICML, pp. 577–584 (2001)
Basu, S., Banerjee, A., Mooney, R.: Semi-supervised clustering by seeding. In: Proc. ICML, pp. 19–26 (2002)
Basu, S., Davidson, I., Wagstaff, K.: Constrained clustering: Advances in algorithms, theory, and applications. CRC Press (2008)
Böhm, C., Plant, C.: Hissclu: a hierarchical density-based method for semi-supervised clustering. In: Proc. EDBT, pp. 440–451 (2008)
Huang, S.J., Jin, R., Zhou, Z.H.: Active learning by querying informative and representative examples. In: Proc. NIPS, pp. 892–900 (2010)
Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In: Proc. ACM SIGIR, pp. 3–12 (1994)
McCallum, A., Nigam, K.: et al.: Employing EM in pool-based active learning for text classification. In: Proc. ICML, pp. 350–358 (1998)
Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: Proc. COLT Workshop, pp. 287–294 (1992)
Atkinson, A.C., Donev, A.N., Tobias, R.D.: Optimum experimental designs, with SAS, vol. 34. Oxford University Press, Oxford (2007)
Mallapragada, P.K., Jin, R., Jain, A.K.: Active query selection for semi-supervised clustering. In: Proc. ICPR, pp. 1–4 (2008)
Nguyen, H.T., Smeulders, A.: Active learning using pre-clustering. In: Proc. ICML, pp. 623–630 (2004)
Vu, V.V., Labroche, N., Bouchon-Meunier, B.: Active learning for semi-supervised k-means clustering. In: Proc. IEEE ICTAI, pp. 12–15 (2010)
Basu, S., Banerjee, A., Mooney, R.J.: Active semi-supervision for pairwise constrained clustering. In: Proc. SAIM SDM, pp. 333–344 (2004)
Xiong, S., Azimi, J., Fern, X.Z.: Active learning of constraints for semi-supervised clustering. IEEE TKDE 26(1), 43–54 (2014)
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. JMLR 7, 2399–2434 (2006)
Hubert, L., Arabie, P.: Comparing partitions. Journal of Classification 2(1), 193–218 (1985)
Geusebroek, J.M., Burghouts, G.J., Smeulders, A.W.M.: The Amsterdam Library of Object Images. Int. Journal of Computer Vision 61(1), 103–112 (2005)
Horta, D., Campello, R.J.G.B.: Automatic aspect discrimination in data clustering. Pattern Recognition 45(12), 4370–4388 (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Li, J., Sander, J., Campello, R., Zimek, A. (2014). Active Learning Strategies for Semi-Supervised DBSCAN. In: Sokolova, M., van Beek, P. (eds) Advances in Artificial Intelligence. Canadian AI 2014. Lecture Notes in Computer Science(), vol 8436. Springer, Cham. https://doi.org/10.1007/978-3-319-06483-3_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-06483-3_16
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06482-6
Online ISBN: 978-3-319-06483-3
eBook Packages: Computer ScienceComputer Science (R0)