Abstract
In this contribution we present a new method for data clustering based on principal curves. Principal curves consist of a nonlinear generalization of principal component analysis and may also be regarded as continuous versions of 1D self-organizing maps. The proposed method implements the k-segment algorithm for principal curves extraction. Then, the method divides the principal curves into two or more curves, according to the number of clusters defined by the user. Thus, the distance between the data points and the generate curves is calculated and, afterwards, the classification is performed according to the smallest distance found. The method was applied to nine databases with different dimensionality and number of classes. The results were compared with three clustering algorithms: the k-means algorithm and the 1-D and 2-D self-organizing map algorithms. Experiments show that the method is suitable for clusters with elongated and spherical shapes and achieved significantly better results in some data sets than other clustering algorithms used in this work.
Similar content being viewed by others
References
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Carvalho AM, Adão P, Mateus P (2014) Hybrid learning of Bayesian multinets for binary classification. Pattern Recognit 47(10):3438–3450
Chang K, Ghosh J (1998a) Principal curve classifier: a nonlinear approach to pattern classification. In: IEEE world congress on computational intelligence. IEEE international joint conference on neural networks proceedings, pp 695–700
Chang K, Ghosh J (1998b) Principal curves for nonlinear feature extraction and classification. Appl Artif Neural Netw Image Process III 3307:120–129
Chen Z, Ellis T (2014) A self-adaptive gaussian mixture model. Comput Vis Image Underst 122:35–46
Cleju I, Fränti P, Wu X (2005) Clustering based on principal curve. In: Kalviainen H, Parkkinen J, Kaarna A (eds) Image analysis, Lecture Notes in Computer Science, vol 3540. Springer, Berlin, pp 872–881
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Cuingnet R, Rosso C, Chupin M, Lehéricy S, Dormont D, Benali H, Samson Y, Colliot O (2011) Spatial regularization of \(\{\text{ SVM }\}\) for the detection of diffusion alterations associated with stroke outcome. Med Image Anal 15(5):729–737
Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley, Hoboken
Ferreira DD, de Seixas JM, Cerqueira AS, Duque CA (2013) Exploiting principal curves for power quality monitoring. Electr Power Syst Res 100:1–6
Ferreira DD, de Seixas JM, Duque CA, Cerqueira AS (2014) A direct approach for disturbance detection based on principal curves. In: IEEE 16th international conference on harmonics and quality of power, pp 747–751
Ferreira DD, de Seixas JM, Cerqueira AS, Duque CA, Bollen MHJ, Ribeiro PF (2015) A new power quality deviation index based on principal curves. Electr Power Syst Res 125:8–14
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188
Gersho A, Gray RM (1992) Vector quantization and signal compression. Kluwer Academic Publishers, Boston
Hastie TJ, Stuetzle W (1989) Principal curves. J Am Stat Assoc 84(406):502–516
Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recognit Lett 31(8):651–666
Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, New York
Kégl B, Krzyzak A, Linder T, Zeger K (2000) Learning and design of principal curves. IEEE Trans Pattern Anal Mach Intell 22(3):281–297
Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Plathottam SJ, Salehfar H (2016) Induction machine transient energy loss minimization using neural networks. In: 2016 North American Power Symposium (NAPS), pp 1–5
Rosa GH, Costa KAP, Júnior LAP, Papa JP, Falcão AX, Tavares JMRS (2014) On the training of artificial neural networks with radial basis function using optimum-path forest clustering. In: 2014 22nd International conference on pattern recognition, pp 1472–1477
Rosenblatt F (1962) Principles of neurodynamics: perceptrons and the theory of brain mechanisms. Spartan, Washington DC
Shelhamer E, Long J, Darrell T (2016) Fully convolutional networks for semantic segmentation. arXiv:1605.06211
Stanford D, Raftery A (2000) Finding curvilinear features in spatial point patterns: principal curve clustering with noise. IEEE Trans Pattern Anal Mach Intell 22(6):601–609
Theodoridis S, Koutroumbas K (2009) Pattern recognition, 4th edn. Elsevier, Amsterdam
Vatanen T, Osmala M, Raiko T, Lagus K, Sysi-Aho M, Orešič M, Honkela T, Lähdesmäki H (2015) Self-organization and missing values in \(\{\text{ SOM }\}\) and \(\{\text{ GTM }\}\). Neurocomputing 147:60–70
Verbeek JJ, Vlassis N, Krose B (2002) A K-segments Algorithm for Finding Principal Curves. Pattern Recognit Lett 23:1009–1017
Wang H, Lee TCM (2006) Automatic parameter selection for a K-segments algorithm for computing principal curves. Pattern Recognit Lett 27:1142–1150
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Moraes, E.C.C., Ferreira, D.D., Vitor, G.B. et al. Data clustering based on principal curves. Adv Data Anal Classif 14, 77–96 (2020). https://doi.org/10.1007/s11634-019-00363-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-019-00363-w