DBSCAN-like clustering method for various data densities

Scitovski, Rudolf; Sabo, Kristian

doi:10.1007/s10044-019-00809-z

DBSCAN-like clustering method for various data densities

Theoretical advances
Published: 05 April 2019

Volume 23, pages 541–554, (2020)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Rudolf Scitovski¹ &
Kristian Sabo¹

1351 Accesses
27 Citations
Explore all metrics

Abstract

In this paper, we propose a modification of the well-known DBSCAN algorithm, which recognizes clusters with various data densities in a given set of data points \({\mathcal {A}}=\{a^i\in {\mathbb {R}}^n:i=1,\dots ,m\}\). First, we define the parameter \(MinPts=\lfloor \ln |{\mathcal {A}}|\rfloor\) and after that, by using a standard procedure from DBSCAN algorithm, for each \(a\in {\mathcal {A}}\) we determine radius \(\epsilon _a\) of the circle containing MinPts elements from the set \({\mathcal {A}}\). We group the set of all these radii into the most appropriate number (t) of clusters by using Least Squares distance-like function applying SymDIRECT or SepDIRECT algorithm. In that way, we obtain parameters \(\epsilon _1>\dots >\epsilon _t\). Furthermore, for parameters \(\{MinPts,\epsilon _1\}\) we construct a partition starting with one cluster and then add new clusters for as long as the isolated groups of at least MinPts data points in some circle with radius \(\epsilon _1\) exist. We follow a similar procedure for other parameters \(\epsilon _2,\dots ,\epsilon _t\). After the implementation of the algorithm, a larger number of clusters appear than can be expected in the optimal partition. Along with defined criteria, some of them are merged by applying a merging process for which a detailed algorithm has been written. Compared to the standard DBSCAN algorithm, we show an obvious advantage for the case of data with various densities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AA-DBSCAN: an approximate adaptive DBSCAN for finding clusters with varying densities

Article 08 May 2018

A Novel Approach to Determining the Radius of the Neighborhood Required for the DBSCAN Algorithm

Merging DBSCAN and Density Peak for Robust Clustering

References

Aggarwall CC, Reddy CK (2013) Data clustering: algorithms and applications. CRC data mining and knowledge discovery series. Chapman & Hall, London
Google Scholar
Akinlar C, Topal C (2013) Edcircles: a real-time circle detector with a false detection control. Pattern Recognit 46:725–740
Google Scholar
Amami R, Smiti A (2017) An incremental method combining density clustering and support vector machines for voice pathology detection. Comput Electr Eng 57:257–265
Google Scholar
Andrade G, Ramos G, Madeira D, Sachetto R, Ferreira R, Rocha L (2013) G-DBSCAN: a GPU accelerated algorithm for density-based clustering. Procedia Comput Sci 18:369–378
Google Scholar
Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) OPTICS: ordering points to identify the clustering structure. ACM Sigmod Rec 28:49–60
Google Scholar
Bagirov AM, Ugon J, Webb D (2011) Fast modified global \(k\)-means algorithm for incremental cluster construction. Pattern Recognit 44:866–876
MATH Google Scholar
Bakr AM, Ghanem NM, Ismail MA (2015) Efficient incremental density-based algorithm for clustering large datasets. Alex Eng J 54:1147–1154
Google Scholar
Bezdek JC, Keller J, Krisnapuram R, Pal NR (2005) Fuzzy models and algorithms for pattern recognition and image processing. Springer, New York
MATH Google Scholar
Birant D, Kut A (2007) ST-DBSCAN: an algorithm for clustering spatial–temporal data. Data Knowl Eng 60:208–221
Google Scholar
Cuesta-Albertos JA, Gordaliza A, Matrán C (1997) Trimmed \(k\)-means: an attempt to robustify quantizers. Ann Stat 25(2):553–576
MathSciNet MATH Google Scholar
Darong H, Peng W (2012) Grid-based DBSCAN algorithm with referential parameters. Phys Procedia 24:1166–1170
Google Scholar
Ertöz L, Steinbach M, Kumar V (2003) Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of second SIAM international conference on data mining, San Francisco
Ester M, Krieogel H, Sander J (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: 2nd International conference on knowledge discovery and data mining (KDD-96), Portland, pp 226–231
Frigui H (2005) Unsupervised learning of arbitrarily shaped clusters using ensembles of Gaussian models. Pattern Anal Appl 8:32–49
MathSciNet MATH Google Scholar
Fritz H, García-Escudero LA, Mayo-Iscar A (2013) A fast algorithm for robust constrained clustering. Comput Stat Data Anal 61:124–136
MathSciNet MATH Google Scholar
Grbić R, Grahovac D, Scitovski R (2016) A method for solving the multiple ellipses detection problem. Pattern Recognit 60:824–834
Google Scholar
Grbić R, Nyarko EK, Scitovski R (2013) A modification of the DIRECT method for Lipschitz global optimization for a symmetric function. J Glob Optim 57:1193–1212
MathSciNet MATH Google Scholar
Gunawan A (2013). A Faster Algorithm for DBSCAN. Ph.D. thesis, Technische Universiteit Eindhoven
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218
MATH Google Scholar
Jiang H, Li J, Yi S, Wang X, Hu X (2011) A new hybrid method based on partitioning-based DBSCAN and ant clustering. Expert Syst Appl 38:9373–9381
Google Scholar
Jones DR (2001) The direct global optimization algorithm. In: Floudas CA, Pardalos PM (eds) The encyclopedia of optimization. Kluwer Academic Publishers, Dordrect, pp 431–440
Google Scholar
Jones DR, Perttunen CD, Stuckman BE (1993) Lipschitzian optimization without the Lipschitz constant. J Optim Theory Appl 79:157–181
MathSciNet MATH Google Scholar
Karami A, Johansson R (2014) Choosing DBSCAN parameters automatically using differential evolution. Int J Comput Appl 91:1–11
Google Scholar
Kogan J (2007) Introduction to clustering large and high-dimensional data. Cambridge University Press, New York
MATH Google Scholar
Kumar KM, Reddy ARM (2016) A fast DBSCAN clustering algorithm by accelerating neighbor searching using groups method. Pattern Recognit 58:39–48
Google Scholar
Lai HP, Visani M, Boucher A, Ogier JM (2012) An experimental comparison of clustering methods for content-based indexing of large image databases. Pattern Anal Appl 15:345–366
MathSciNet Google Scholar
Li Z, Zhang Y, Gong H, Liu G, Li W, Tang X (2017) An automatic and efficient coronary arteries extraction method in CT angiographies. Biomed Signal Process Control 36:221–233
Google Scholar
Louhichi S, Gzara M, Ben-Abdallah H (2017) Unsupervised varied density based clustering algorithm using spline. Pattern Recognit Lett 93:48–57
Google Scholar
MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, pp 281–297
Marošević T, Sabo K, Taler P (2013) A mathematical model for uniform distribution voters per constituencies. Croat Oper Res Rev 4:53–64
Google Scholar
McCallum A, Nigam K, Ungar LH (2000) Efficient clustering of high-dimensional data sets with application to reference matching. In: International conference on knowledge discovery and data mining. DBLP
Mimaroglu S, Aksehirli E (2011) Improving DBSCAN’s execution time by using a pruning technique on bit vectors. Pattern Recognit Lett 32:1572–1580
Google Scholar
Morales-Esteban A, Martínez-Álvarez F, Scitovski S, Scitovski R (2014) A fast partitioning algorithm using adaptive Mahalanobis clustering with application to seismic zoning. Comput Geosci 73:132–141
Google Scholar
Sabo K, Scitovski R (2015) An approach to cluster separability in a partition. Inf Sci 305:208–218
MathSciNet MATH Google Scholar
Sabo K, Scitovski R, Vazler I (2013) One-dimensional center-based \(l_1\)-clustering method. Optim Lett 7:5–22
MathSciNet MATH Google Scholar
Scitovski R (2017) A new global optimization method for a symmetric Lipschitz continuous function and application to searching for a globally optimal partition of a one-dimensional set. J Glob Optim 68:713–727
MathSciNet MATH Google Scholar
Scitovski R, Marošević T (2014) Multiple circle detection based on center-based clustering. Pattern Recognit Lett 52:9–16
Google Scholar
Scitovski R, Sabo K (2014) Analysis of the \(k\)-means algorithm in the case of data points occurring on the border of two or more clusters. Knowl Based Syst 57:1–7
Google Scholar
Scitovski R, Scitovski S (2013) A fast partitioning algorithm and its application to earthquake investigation. Comput Geosci 59:124–131
Google Scholar
Scitovski R, Vidović I, Bajer D (2016) A new fast fuzzy partitioning algorithm. Expert Syst Appl 51:143–150
Google Scholar
Späth H (1983) Cluster-formation und analyse. R. Oldenburg Verlag, München
MATH Google Scholar
Steinbach M, Tan PN, Potter VKC, Klooster S (2002) Data mining for the discovery of ocean climate indices, In: Mining scientific datasets workshop, 2nd Annual SIAM international conference on data mining
Teboulle M, Berkhin P, Dhilon I, Guan Y, Kogan J (2006) Clustering with entropy-like \(k\)-means algorithms. In: Kogan J, Nicholas C, Teboulle M (eds) Grouping multidimensional data. Springer, Berlin, pp 127–160
Google Scholar
Theodoridis S, Koutroumbas K (2009) Pattern recognition, 4th edn. Academic Press, Burlington
MATH Google Scholar
Vendramin L, Campello RJGB, Hruschka ER (2009) On the comparison of relative clustering validity criteria, In: Proceedings of the SIAM international conference on data mining, SDM 2009, April 30–May 2, 2009. SIAM, Sparks, pp 733–744
Viswanath P, Babu VS (2009) Rough-DBSCAN: a fast hybrid density based clustering method for large data sets. Pattern Recognit Lett 30:1477–1488
Google Scholar
Wolfram Research I (2016) Mathematica, version 11.0 edition. Wolfram Research, Inc., Champaign
Xie J, Gao H, Xie W, Liu X, Grant PW (2016) Robust clustering by detecting density peaks and assigning points based on fuzzy weighted \(K\)-nearest neighbors. Inf Sci 354:19–40
Google Scholar
Zaki MJ, Meira W Jr (2014) Data mining and analysis: fundamental concepts and algorithms. Cambridge University Press, New York
MATH Google Scholar
Zhu Y, Ting KM, Carman MJ (2016) Density-ratio based clustering for discovering clusters with varying densities. Pattern Recognit 60:983–997
MATH Google Scholar

Download references

Acknowledgements

The author would like to thank the referees and the journal editors for their careful reading of the paper and insightful comments that helped us improve the paper. Especially, the author would like to thank Mrs. Katarina Moržan for significantly improving the use of English in the paper. This work was supported by the Croatian Science Foundation through research Grant IP-2016-06-6545 “The optimization and statistical models and methods in recognizing properties of data sets measured with errors” and research Grant IP-2016-06-8350 “Methodological framework for efficient energy management by intelligent data analytics”.

Author information

Authors and Affiliations

Department of Mathematics, University of Osijek, Trg Ljudevita Gaja 6, 31 000, Osijek, Croatia
Rudolf Scitovski & Kristian Sabo

Authors

Rudolf Scitovski
View author publications
You can also search for this author in PubMed Google Scholar
Kristian Sabo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rudolf Scitovski.

Additional information

Publisher's Note

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations

Rights and permissions

Reprints and permissions

About this article

Cite this article

Scitovski, R., Sabo, K. DBSCAN-like clustering method for various data densities. Pattern Anal Applic 23, 541–554 (2020). https://doi.org/10.1007/s10044-019-00809-z

Download citation

Received: 09 December 2017
Accepted: 22 March 2019
Published: 05 April 2019
Issue Date: May 2020
DOI: https://doi.org/10.1007/s10044-019-00809-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DBSCAN-like clustering method for various data densities

Abstract

Access this article

Similar content being viewed by others

AA-DBSCAN: an approximate adaptive DBSCAN for finding clusters with varying densities

A Novel Approach to Determining the Radius of the Neighborhood Required for the DBSCAN Algorithm

Merging DBSCAN and Density Peak for Robust Clustering

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

DBSCAN-like clustering method for various data densities

Abstract

Access this article

Similar content being viewed by others

AA-DBSCAN: an approximate adaptive DBSCAN for finding clusters with varying densities

A Novel Approach to Determining the Radius of the Neighborhood Required for the DBSCAN Algorithm

Merging DBSCAN and Density Peak for Robust Clustering

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation