Skip to main content
Log in

A study of feature representation via neural network feature extraction and weighted distance for clustering

  • Published:
Journal of Combinatorial Optimization Aims and scope Submit manuscript

Abstract

Neural Networks are well known for its performance to classify and cluster data sets via multiple layers of networks passing and transforming information pictured by raw data. The feature layer projects the raw data into a space spanned by hidden features. To understand data representations in both original (i.e., image) and feature spaces, the main purpose of this research is to analyze the clustering performance with different feature representations. Naturally, distance measures have a great impact on clustering performance. Different distances and their combinations are tested on both the original and feature spaces. The combined distances were obtained by using different optimal weights that minimize classification errors in different measures via a series of optimization models. These weights were multiplied by their respective distances in order to create the combined distance. Clustering was evaluated using silhouette scores. The feature space in general has better performance, in terms of clustering, than the image space, with Cosine Similarity being the best distance for both the image space and feature space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availibility

Enquiries about data availability should be directed to the authors.

References

  • A Euijoon et al. (2019) Unsupervised feature learning with K-means and an ensemble of deep convolutional neural networks for medical image classification. In: arXiv preprint arXiv:1906.03359

  • Deepak Sinwar, Rahul Kaushik (2014) Study of Euclidean and Manhattan distance metrics using simple k-means clustering. Int J Res Appl Sci Eng Technol 2.5:270–274

    Google Scholar 

  • Finley T, Joachims T (2005) Supervised clustering with support vector machines. In: proceedings of the 22nd international conference on Machine learning. pp. 217–224

  • Finley T, Joachims T. (2008) Supervised k-means clustering. Tech Rep

  • Francis Bach, Michael Jordan (2004) Learning spectral clustering. Adv Neural Inf Process Syst 16.2:305–312

    Google Scholar 

  • Haider P, Brefeld U, Scheffer T (2007) Supervised clustering of streaming data for email batch detection. In: proceedings of the 24th international conference on Machine learning. pp. 345– 352

  • Jianchang Mao, Jain Anil K (1995) Artificial neural networks for feature extraction and multivariate data projection. IEEE Trans Neural Netw 6.2:296–317

    Article  Google Scholar 

  • Kr PB, Sukumar N, Viswanath P (2011) A distance based clustering method for arbitrary shaped clusters in large datasets. Pattern Recogn 44.12:2862–2870

    MATH  Google Scholar 

  • Likas A, Vlassis N, Verbeek JJ (2003) The global k-means clustering algorithm. Pattern Recogn Biometrics 36(2):451–461. https://doi.org/10.1016/S0031-3203(02)00060-2

    Article  Google Scholar 

  • Mao Yunxiang, Yin Zhaozheng, Schober Joseph (2016) A deep convolutional neural network trained on representative samples for circulating tumor cell detection. In: 2016 IEEE Winter Conference. IEEE. pp. 1–6

  • Merigó José M, Casanovas Montserrat (2011) A new Minkowski distance based on induced aggregation operators. Int J Comput Intell Syst 4.2:123–133. https://doi.org/10.1080/18756891.2011.9727769

    Article  Google Scholar 

  • Park HS, Jun CH (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst Appl 36(2):3336–3341. https://doi.org/10.1016/j.eswa.2008.01.039

    Article  Google Scholar 

  • Per-Erik Danielsson (1980) Euclidean distance mapping. Comput Graph Image Process 14.3:227–248

    Google Scholar 

  • Rahutomo F, Kitasuka T, Aritsugi M (2012) Semantic cosine similarity. In: The 7th international student conference on advanced science and technology ICAST. Vol. 4. 1. p. 1

  • Roland Coghetto (2016) Chebyshev distance. Formal Math 24.2:121–141

    MATH  Google Scholar 

  • Schleider Lily, Pasiliao Eduardo L, Zheng Qipeng P (2020) Graph-Based Supervised Clustering in Vector Space. In: international conference on computational data and social networks. Ed. by Sriram Chellappan, Kim-Kwang Raymond Choo, and NhatHai Phan, pp. 476–486

  • Xie J, Girshick R, Farhadi A. (2016) Unsupervised deep embedding for clustering analysis. In: international conference on machine learning. PMLR. pp. 478–487

  • Yang J, Parikh D, Batra D (2016) Joint unsupervised learning of deep representations and image clusters. In: proceedings of the IEEE conference on computer vision and pattern recognition. pp. 5147–5156

Download references

Acknowledgements

This article is based on basic research works supported by AFRL Mathematical Modeling and Optimization Institute.

Funding

The work was supported in part by the U.S. Air Force Research Laboratory (AFRL) award FA8651-16-2-0009.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qipeng P. Zheng.

Ethics declarations

Conflict of interest

We do not have competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Disclaimer: comparisons and improvements to the previous conference article.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Schleider, L., Pasiliao, E.L., Qiang, Z. et al. A study of feature representation via neural network feature extraction and weighted distance for clustering. J Comb Optim 44, 3083–3105 (2022). https://doi.org/10.1007/s10878-022-00849-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10878-022-00849-y

Keywords

Navigation