Abstract
Discovering the similar groups is a popular primary step in analysis of biomedical data, which cannot be identified manually. Many supervised and unsupervised machine learning and statistical approaches have been developed to solve this problem. Clustering is an unsupervised learning approach, which organizes the data into similar groups, and is used to discover the intrinsic hidden structure of data. In this paper, we used clustering by fast search and find of density peaks (CDP) approach for cancer subtyping and identification of normal tissues from tumor tissues. In additional, we also address the preprocessing and underlying distance matrix’s impact on finalized groups. We have performed extensive experiments on real-world and synthetic cancer gene expression microarray data sets and compared obtained results with state-of-the-art clustering approaches.
Similar content being viewed by others
References
Ronan T, Qi Z, Naegle KM (2016) Avoiding common pitfalls when clustering biological data. Sci Signal 9:re6
Zhuge H, Sun Y, (2010) The schema theory for semantic link network. Future Generation Computer Systems 26 (3):408-420
Mehmood R, Zhang G, Bie R, Dawood H, Ahmad H (2016 Oct 5) Clustering by fast search and find of density peaks via heat diffusion. Neurocomputing 208:210–217. https://doi.org/10.1016/j.neucom.2016.01.102
Bie R, Mehmood R, Ruan S, Sun Y, Dawood H, (2016) Adaptive fuzzy clustering by fast search and find of density peaks. Personal and Ubiquitous Computing 20 (5):785-793
Cai Z, Goebel R, Salavatipour M, Lin G (2007) Selecting dissimilar genes for multi-class classification, an application in cancer subtyping. BMC Bioinformatics. 8:206.
Wiwie C, Baumbach J, Röttger R (2015) Comparing the performance of biomedical clustering methods. Nat Methods 12(11):1033–1038. https://doi.org/10.1038/nmeth.3583
Cai Z, Heydari M, Lin G (2006) Iterated local least squares microarray missing value imputation. Journal of Bioinformatics and Computational Biology 04 (05):935-957
Yang K, Cai Z, Li J, Lin G (2006) A Stable Gene Selection in Microarray Data Analysis. BMC Bioinformatics. 7:228.
Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52(1/2):91–118. https://doi.org/10.1023/A:1023949509487
Johnson SC (1967) Hierarchical clustering schemes. Psychometrika 32(3):241–254. https://doi.org/10.1007/BF02289588
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability 1(14):281–297
Kohonen T (1998) The self-organizing map. Neurocomputing 21(1):1–6. https://doi.org/10.1016/S0925-2312(98)00030-7
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496. https://doi.org/10.1126/science.1242072
Krivtsov AV, Twomey D, Feng Z, Stubbs MC, Wang Y, Faber J, Levine JE, Wang J, Hahn WC, Gilliland DG, Golub TR, Armstrong SA (2006) Transformation from committed progenitor to leukaemia stem cell initiated by mll–af9. Nature 442(7104):818–822. https://doi.org/10.1038/nature04980
Chang JC, Wooten EC, Tsimelzon A, Hilsenbeck SG et al (2005) Patterns of resistance and incomplete response to docetaxel by gene expression profiling in breast cancer patients. J Clin Oncol 23(6):1169–1177
Jain A, Nandakumar K, Ross A (2005 Dec 31) Score normalization in multimodal biometric systems. Pattern Recogn 38(12):2270–2285. https://doi.org/10.1016/j.patcog.2005.01.012
Volinia S, Calin GA, Liu CG, Ambs S, Cimmino A, Petrocca F, Visone R, Iorio M, Roldo C, Ferracin M, Prueitt RL (2006) A microRNA expression signature of human solid tumors defines cancer gene targets. Proc Natl Acad Sci U S A 103(7):2257–2261. https://doi.org/10.1073/pnas.0510565103
Mehmood R, El-Ashram S, Bie R, Dawood H, Kos A (2017) Clustering by fast search and merge of local density peaks for gene expression microarray data. Scientific Reports 7:45602
Funding
This research is sponsored by the National Natural Science Foundation of China (No. 61571049, 61371185, 61401029, 61472044, and 61472403) and the Fundamental Research Funds for the Central Universities (No. 2014KJJCB32 and 2013NT57) and by SRF for ROCS, SEM.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no competing interests.
Rights and permissions
About this article
Cite this article
Mehmood, R., El-Ashram, S., Bie, R. et al. Effective cancer subtyping by employing density peaks clustering by using gene expression microarray. Pers Ubiquit Comput 22, 615–619 (2018). https://doi.org/10.1007/s00779-018-1112-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00779-018-1112-y