Abstract
In this paper, we compare different sampling algorithms used for identifying the defective pathways in highly underdetermined phenotype prediction problems. The first algorithm (Fisher’s ratio sampler) selects the most discriminatory genes and samples the high discriminatory genetic networks according to a prior probability that it is proportional to their individual Fisher’s ratio. The second one (holdout sampler) is inspired by the bootstrapping procedure used in regression analysis and uses the minimum-scale signatures found in different random hold outs to establish the most frequently sampled genes. The third one is a pure random sampler which randomly builds networks of differentially expressed genes. In all these algorithms, the likelihood of the different networks is established via leave one out cross-validation (LOOCV), and the posterior analysis of the most frequently sampled genes serves to establish the altered biological pathways. These algorithms are compared to the results obtained via Bayesian Networks (BNs). We show the application of these algorithms to a microarray dataset concerning Triple Negative Breast Cancers. This comparison shows that the Random, Fisher’s ratio and Holdout samplers are most effective than BNs, and all provide similar insights about the genetic mechanisms that are involved in this disease. Therefore, it can be concluded that all these samplers are good alternatives to Bayesian Networks which much lower computational demands. Besides this analysis confirms the insight that the altered pathways should be independent of the sampling methodology and the classifier that is used to infer them.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
De Andrés Galiana, E.J., Fernández-Martínez, J.L., Sonis, S.: Design of biomedical robots for phenotype prediction problems. J. Comput. Biol. 23(8), 678–692 (2016)
De Andrés-Galiana, E.J., Fernández-Martínez, J.L., Sonis, S.: Sensitivity analysis of gene ranking methods in phenotype prediction. J. Biomed. Inf. 64, 255–264 (2016)
Fernández-Martínez, J.L., Fernández-Muñiz, M.Z., Tompkins, M.J.: On the topography of the cost functional in linear and nonlinear inverse problems. Geophysics 77(1), W1–W15 (2012). https://doi.org/10.1190/geo2011-0341.1
Cernea, A., Fernández-Martínez, J.L., deAndrés-Galiana, E.J., Fernández-Ovies, F.J., Fernández-Muñiz, Z., Álvarez-Machancoses, O., Saligan, L.N., Sonis, S.: Sampling defective pathways in phenotype prediction problems via the Fisher’s ratio sampler. In: IWBBIO 2018 (2018)
Saligan, L.N., Fernández-Martínez, J.L., de Andrés Galiana, E.J., Sonis, S.: Supervised classification by filter methods and recursive feature elimination predicts risk of radiotherapy-related fatigue in patients with prostate cancer. Cancer Inf. 13(141–152), 2014 (2014)
Fernández-Martínez, J.L., Cernea, A., deAndrés-Galiana, E.J., Fernández-Ovies, F.J., Fernández-Muñiz, Z., Álvarez-Machancoses, O., Saligan, L.N., Sonis, S.: Sampling defective pathways in phenotype prediction problems via the Holdout sampler. In: IWBBIO 2018 (2018)
Jiang, X., Barmada, M.M., Visweswaran, S.: Identifying genetic interactions in genome-wide data using Bayesian networks. Genet. Epidemiol. 34(6), 575–581 (2010)
Hageman, R.S., Leduc, M.S., Korstanje, R., Paigen, B., Churchill, G.A.: A Bayesian framework for inference of the genotype-phenotype map for segregating populations. Genetics 187(4), 1163–1170 (2011)
McGeachie, M.J., Chang, H.H., Weiss, S.T.: CGBayesNets: conditional gaussian Bayesian network learning and inference with mixed discrete and continuous data. PLoS Comput. Biol. 10(6), e1003676 (2014)
Su, C., Andrew, A., Karagas, M.R., Borsuk, M.E.: Using Bayesian networks to discover relations between genes, environment, and disease. BioData Mining 6, 6 (2013)
Jézéquel, P., Loussouarn, D., Guérin-Charbonnel, C., Campion, L., et al.: Gene-expression molecular subtyping of triple-negative breast cancer tumours: importance of immune response. Breast Cancer Res. 20(17), 43 (2015)
Efron, B., Tibshirani, R.: An Introduction to the Bootstrap. Chapman & Hall/CRC, Boca Raton (1993). ISBN 0-412-04231-2
Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. MIT Press, Cambridge. xxxv, 1231 p. (2009)
Stelzer, G., Inger, A., Olender, T., Iny-Stein, T., Dalah, I., Harel, A., et al.: GeneDecks: paralog hunting and gene-set distillation with GeneCards annotation. OMICS 13(6), 477 (2009)
Qin, N., Wang, C., Lu, Q., et al.: A cis-eQTL genetic variant of the cancer–testis gene CCDC116 is associated with risk of multiple cancers. Hum. Genet. 136, 987 (2017). https://doi.org/10.1007/s00439-017-1827-2
Oyama, T., Miyoshi, Y., Koyama, K., Nakagawa, H., Yamori, T., Ito, T., Matsuda, H., Arakawa, H., Nakamura, Y.: Isolation of a novel gene on 8p21. 3–22 whose expression is reduced significantly in human colorectal cancers with liver metastasis. Genes Chromosomes. Cancer 29, 9–15 (2000)
Wan, M., Huang, W., Kute, T.E., Miller, L.D., Zhang, Q., Hatcher, H., Wang, J., Stovall, D.B., Russell, G.B., Cao, P.D., Deng, Z., Wang, W., Zhang, Q., Lei, M., Torti, S.V., Akman, S.A., Sui, G.: Yin Yang 1 plays an essential role in breast cancer and negatively regulates p27. Am. J. Pathol. 180(5), 2120–2133 (2012). https://doi.org/10.1016/j.ajpath.2012.01.037
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Cernea, A. et al. (2018). Comparison of Different Sampling Algorithms for Phenotype Prediction. In: Rojas, I., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2018. Lecture Notes in Computer Science(), vol 10814. Springer, Cham. https://doi.org/10.1007/978-3-319-78759-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-78759-6_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78758-9
Online ISBN: 978-3-319-78759-6
eBook Packages: Computer ScienceComputer Science (R0)