A study of feature representation via neural network feature extraction and weighted distance for clustering

Schleider, Lily; Pasiliao, Eduardo L.; Qiang, Zhecheng; Zheng, Qipeng P.

doi:10.1007/s10878-022-00849-y

A study of feature representation via neural network feature extraction and weighted distance for clustering

Published: 21 February 2022

Volume 44, pages 3083–3105, (2022)
Cite this article

Journal of Combinatorial Optimization Aims and scope Submit manuscript

Lily Schleider¹,
Eduardo L. Pasiliao²,
Zhecheng Qiang¹ &
…
Qipeng P. Zheng ORCID: orcid.org/0000-0002-4597-3426¹

273 Accesses
4 Citations
Explore all metrics

Abstract

Neural Networks are well known for its performance to classify and cluster data sets via multiple layers of networks passing and transforming information pictured by raw data. The feature layer projects the raw data into a space spanned by hidden features. To understand data representations in both original (i.e., image) and feature spaces, the main purpose of this research is to analyze the clustering performance with different feature representations. Naturally, distance measures have a great impact on clustering performance. Different distances and their combinations are tested on both the original and feature spaces. The combined distances were obtained by using different optimal weights that minimize classification errors in different measures via a series of optimization models. These weights were multiplied by their respective distances in order to create the combined distance. Clustering was evaluated using silhouette scores. The feature space in general has better performance, in terms of clustering, than the image space, with Cosine Similarity being the best distance for both the image space and feature space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High-Dimensional Data Clustering with Fuzzy C-Means: Problem, Reason, and Solution

Graph-Based Supervised Clustering in Vector Space

Machine Learning for Image Classification and Clustering Using a Universal Distance Measure

Data availibility

Enquiries about data availability should be directed to the authors.

References

A Euijoon et al. (2019) Unsupervised feature learning with K-means and an ensemble of deep convolutional neural networks for medical image classification. In: arXiv preprint arXiv:1906.03359
Deepak Sinwar, Rahul Kaushik (2014) Study of Euclidean and Manhattan distance metrics using simple k-means clustering. Int J Res Appl Sci Eng Technol 2.5:270–274
Google Scholar
Finley T, Joachims T (2005) Supervised clustering with support vector machines. In: proceedings of the 22nd international conference on Machine learning. pp. 217–224
Finley T, Joachims T. (2008) Supervised k-means clustering. Tech Rep
Francis Bach, Michael Jordan (2004) Learning spectral clustering. Adv Neural Inf Process Syst 16.2:305–312
Google Scholar
Haider P, Brefeld U, Scheffer T (2007) Supervised clustering of streaming data for email batch detection. In: proceedings of the 24th international conference on Machine learning. pp. 345– 352
Jianchang Mao, Jain Anil K (1995) Artificial neural networks for feature extraction and multivariate data projection. IEEE Trans Neural Netw 6.2:296–317
Article Google Scholar
Kr PB, Sukumar N, Viswanath P (2011) A distance based clustering method for arbitrary shaped clusters in large datasets. Pattern Recogn 44.12:2862–2870
MATH Google Scholar
Likas A, Vlassis N, Verbeek JJ (2003) The global k-means clustering algorithm. Pattern Recogn Biometrics 36(2):451–461. https://doi.org/10.1016/S0031-3203(02)00060-2
Article Google Scholar
Mao Yunxiang, Yin Zhaozheng, Schober Joseph (2016) A deep convolutional neural network trained on representative samples for circulating tumor cell detection. In: 2016 IEEE Winter Conference. IEEE. pp. 1–6
Merigó José M, Casanovas Montserrat (2011) A new Minkowski distance based on induced aggregation operators. Int J Comput Intell Syst 4.2:123–133. https://doi.org/10.1080/18756891.2011.9727769
Article Google Scholar
Park HS, Jun CH (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst Appl 36(2):3336–3341. https://doi.org/10.1016/j.eswa.2008.01.039
Article Google Scholar
Per-Erik Danielsson (1980) Euclidean distance mapping. Comput Graph Image Process 14.3:227–248
Google Scholar
Rahutomo F, Kitasuka T, Aritsugi M (2012) Semantic cosine similarity. In: The 7th international student conference on advanced science and technology ICAST. Vol. 4. 1. p. 1
Roland Coghetto (2016) Chebyshev distance. Formal Math 24.2:121–141
MATH Google Scholar
Schleider Lily, Pasiliao Eduardo L, Zheng Qipeng P (2020) Graph-Based Supervised Clustering in Vector Space. In: international conference on computational data and social networks. Ed. by Sriram Chellappan, Kim-Kwang Raymond Choo, and NhatHai Phan, pp. 476–486
Xie J, Girshick R, Farhadi A. (2016) Unsupervised deep embedding for clustering analysis. In: international conference on machine learning. PMLR. pp. 478–487
Yang J, Parikh D, Batra D (2016) Joint unsupervised learning of deep representations and image clusters. In: proceedings of the IEEE conference on computer vision and pattern recognition. pp. 5147–5156

Download references

Acknowledgements

This article is based on basic research works supported by AFRL Mathematical Modeling and Optimization Institute.

Funding

The work was supported in part by the U.S. Air Force Research Laboratory (AFRL) award FA8651-16-2-0009.

Author information

Authors and Affiliations

Applied Operations Research Lab, IEMS, University of Central Florida, Orlando, USA
Lily Schleider, Zhecheng Qiang & Qipeng P. Zheng
AFRL, Eglin AFB, Valparaiso, FL, USA
Eduardo L. Pasiliao

Authors

Lily Schleider
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo L. Pasiliao
View author publications
You can also search for this author in PubMed Google Scholar
Zhecheng Qiang
View author publications
You can also search for this author in PubMed Google Scholar
Qipeng P. Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qipeng P. Zheng.

Ethics declarations

Conflict of interest

We do not have competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Disclaimer: comparisons and improvements to the previous conference article.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schleider, L., Pasiliao, E.L., Qiang, Z. et al. A study of feature representation via neural network feature extraction and weighted distance for clustering. J Comb Optim 44, 3083–3105 (2022). https://doi.org/10.1007/s10878-022-00849-y

Download citation

Accepted: 20 January 2022
Published: 21 February 2022
Issue Date: November 2022
DOI: https://doi.org/10.1007/s10878-022-00849-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A study of feature representation via neural network feature extraction and weighted distance for clustering

Abstract

Access this article

Similar content being viewed by others

High-Dimensional Data Clustering with Fuzzy C-Means: Problem, Reason, and Solution

Graph-Based Supervised Clustering in Vector Space

Machine Learning for Image Classification and Clustering Using a Universal Distance Measure

Data availibility

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A study of feature representation via neural network feature extraction and weighted distance for clustering

Abstract

Access this article

Similar content being viewed by others

High-Dimensional Data Clustering with Fuzzy C-Means: Problem, Reason, and Solution

Graph-Based Supervised Clustering in Vector Space

Machine Learning for Image Classification and Clustering Using a Universal Distance Measure

Data availibility

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation