Data clustering based on principal curves

Moraes, Elson Claudio Correa; Ferreira, Danton Diego; Vitor, Giovani Bernardes; Barbosa, Bruno Henrique Groenner

doi:10.1007/s11634-019-00363-w

Data clustering based on principal curves

Regular Article
Published: 11 June 2019

Volume 14, pages 77–96, (2020)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Elson Claudio Correa Moraes¹,
Danton Diego Ferreira ORCID: orcid.org/0000-0002-4504-7721¹,
Giovani Bernardes Vitor² &
…
Bruno Henrique Groenner Barbosa¹

Abstract

In this contribution we present a new method for data clustering based on principal curves. Principal curves consist of a nonlinear generalization of principal component analysis and may also be regarded as continuous versions of 1D self-organizing maps. The proposed method implements the k-segment algorithm for principal curves extraction. Then, the method divides the principal curves into two or more curves, according to the number of clusters defined by the user. Thus, the distance between the data points and the generate curves is calculated and, afterwards, the classification is performed according to the smallest distance found. The method was applied to nine databases with different dimensionality and number of classes. The results were compared with three clustering algorithms: the k-means algorithm and the 1-D and 2-D self-organizing map algorithms. Experiments show that the method is suitable for clusters with elongated and spherical shapes and achieved significantly better results in some data sets than other clustering algorithms used in this work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Article 27 November 2022

A Comprehensive Survey of Anomaly Detection Algorithms

Article 26 November 2021

References

Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Article Google Scholar
Carvalho AM, Adão P, Mateus P (2014) Hybrid learning of Bayesian multinets for binary classification. Pattern Recognit 47(10):3438–3450
Article Google Scholar
Chang K, Ghosh J (1998a) Principal curve classifier: a nonlinear approach to pattern classification. In: IEEE world congress on computational intelligence. IEEE international joint conference on neural networks proceedings, pp 695–700
Chang K, Ghosh J (1998b) Principal curves for nonlinear feature extraction and classification. Appl Artif Neural Netw Image Process III 3307:120–129
Google Scholar
Chen Z, Ellis T (2014) A self-adaptive gaussian mixture model. Comput Vis Image Underst 122:35–46
Article Google Scholar
Cleju I, Fränti P, Wu X (2005) Clustering based on principal curve. In: Kalviainen H, Parkkinen J, Kaarna A (eds) Image analysis, Lecture Notes in Computer Science, vol 3540. Springer, Berlin, pp 872–881
Chapter Google Scholar
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
MATH Google Scholar
Cuingnet R, Rosso C, Chupin M, Lehéricy S, Dormont D, Benali H, Samson Y, Colliot O (2011) Spatial regularization of \(\{\text{ SVM }\}\) for the detection of diffusion alterations associated with stroke outcome. Med Image Anal 15(5):729–737
Article Google Scholar
Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley, Hoboken
MATH Google Scholar
Ferreira DD, de Seixas JM, Cerqueira AS, Duque CA (2013) Exploiting principal curves for power quality monitoring. Electr Power Syst Res 100:1–6
Article Google Scholar
Ferreira DD, de Seixas JM, Duque CA, Cerqueira AS (2014) A direct approach for disturbance detection based on principal curves. In: IEEE 16th international conference on harmonics and quality of power, pp 747–751
Ferreira DD, de Seixas JM, Cerqueira AS, Duque CA, Bollen MHJ, Ribeiro PF (2015) A new power quality deviation index based on principal curves. Electr Power Syst Res 125:8–14
Article Google Scholar
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188
Article Google Scholar
Gersho A, Gray RM (1992) Vector quantization and signal compression. Kluwer Academic Publishers, Boston
Book Google Scholar
Hastie TJ, Stuetzle W (1989) Principal curves. J Am Stat Assoc 84(406):502–516
Article MathSciNet Google Scholar
Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recognit Lett 31(8):651–666
Article Google Scholar
Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, New York
MATH Google Scholar
Kégl B, Krzyzak A, Linder T, Zeger K (2000) Learning and design of principal curves. IEEE Trans Pattern Anal Mach Intell 22(3):281–297
Article Google Scholar
Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Plathottam SJ, Salehfar H (2016) Induction machine transient energy loss minimization using neural networks. In: 2016 North American Power Symposium (NAPS), pp 1–5
Rosa GH, Costa KAP, Júnior LAP, Papa JP, Falcão AX, Tavares JMRS (2014) On the training of artificial neural networks with radial basis function using optimum-path forest clustering. In: 2014 22nd International conference on pattern recognition, pp 1472–1477
Rosenblatt F (1962) Principles of neurodynamics: perceptrons and the theory of brain mechanisms. Spartan, Washington DC
MATH Google Scholar
Shelhamer E, Long J, Darrell T (2016) Fully convolutional networks for semantic segmentation. arXiv:1605.06211
Stanford D, Raftery A (2000) Finding curvilinear features in spatial point patterns: principal curve clustering with noise. IEEE Trans Pattern Anal Mach Intell 22(6):601–609
Article Google Scholar
Theodoridis S, Koutroumbas K (2009) Pattern recognition, 4th edn. Elsevier, Amsterdam
MATH Google Scholar
Vatanen T, Osmala M, Raiko T, Lagus K, Sysi-Aho M, Orešič M, Honkela T, Lähdesmäki H (2015) Self-organization and missing values in \(\{\text{ SOM }\}\) and \(\{\text{ GTM }\}\). Neurocomputing 147:60–70
Article Google Scholar
Verbeek JJ, Vlassis N, Krose B (2002) A K-segments Algorithm for Finding Principal Curves. Pattern Recognit Lett 23:1009–1017
Article Google Scholar
Wang H, Lee TCM (2006) Automatic parameter selection for a K-segments algorithm for computing principal curves. Pattern Recognit Lett 27:1142–1150
Article Google Scholar

Download references

Author information

Authors and Affiliations

Engineering Department, Federal University of Lavras (UFLA), P.O. Box 3037, Lavras, Minas Gerais, 37200-000, Brazil
Elson Claudio Correa Moraes, Danton Diego Ferreira & Bruno Henrique Groenner Barbosa
Computer Engineering, Federal University of Itajubá, Itabira, Minas Gerais, Brazil
Giovani Bernardes Vitor

Authors

Elson Claudio Correa Moraes
View author publications
You can also search for this author in PubMed Google Scholar
Danton Diego Ferreira
View author publications
You can also search for this author in PubMed Google Scholar
Giovani Bernardes Vitor
View author publications
You can also search for this author in PubMed Google Scholar
Bruno Henrique Groenner Barbosa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Danton Diego Ferreira.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Moraes, E.C.C., Ferreira, D.D., Vitor, G.B. et al. Data clustering based on principal curves. Adv Data Anal Classif 14, 77–96 (2020). https://doi.org/10.1007/s11634-019-00363-w

Download citation

Received: 13 December 2017
Revised: 17 April 2019
Accepted: 04 June 2019
Published: 11 June 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s11634-019-00363-w

Keywords

Mathematics Subject Classification

68Txx Artificial intelligence

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data clustering based on principal curves

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

A Comprehensive Survey of Anomaly Detection Algorithms

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Data clustering based on principal curves

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

A Comprehensive Survey of Anomaly Detection Algorithms

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation