Abstract
Introducing the high-throughput measurement methods into molecular biology was a trigger to develop the algorithms for searching disorders in complex signalling systems, like pathways or gene ontologies. In recent years, there appeared many new solutions, but the results obtained with these techniques are ambiguous. In this work, five different algorithms for pathway enrichment analysis were compared using six microarray datasets covering cases with the same disease. The number of enriched pathways at different significance level and false positive rate of finding enrichment pathways was estimated, and reproducibility of obtained results between datasets was checked. The best performance was obtained for PLAGE method. However, taking into consideration the biological knowledge about analyzed disease condition, many findings may be false positives. Out of the other methods GSVA algorithm gave the most reproducible results across tested datasets, which was also validated in biological repositories. Similarly, good outcomes were given by GSEA method. ORA and PADOG gave poor sensitivity and reproducibility, which stand in contrary to previous research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kanehisa, M., et al.: KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44(D1), D457–D462 (2016)
Ashburner, M., et al.: Gene ontology: tool for the unification of biology. Nat. Genet. 25(1), 25–29 (2000)
Subramanian, A., et al.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. PNAS 102(43), 15545–15550 (2005)
Van Dongen, S., Abreu-Goodger, C., Enright, A.J.: Detecting microRNA binding and siRNA off-target effects from expression data. Nat. Methods 5(12), 1023–1025 (2008)
Laaksonen, R., et al.: A systems biology strategy reveals biological pathways and plasma biomarker candidates for potentially toxic statin-induced changes in muscle. PLoS ONE 1(1), e97 (2006)
Beißbarth, T., Speed, T.P.: GOstat: find statistically overrepresented Gene ontologies within a group of genes. Bioinformatics 20(9), 1464–1465 (2004)
Tarca, A.L., Bhatti, G., Romero, R.: A comparison of gene set analysis methods in terms of sensitivity, prioritization and specificity. PLoS ONE 8(11), e79217 (2013)
Jaakkola, M.K., Elo, L.L.: Empirical comparison of structure-based pathway methods. Brief. Bioinform. 17(2), 336–345 (2016)
Zyla, J., Marczyk, M., Weiner, J., Polanska, J.: Ranking metrics in gene set enrichment analysis: do they matter?. BMC Bioinform. 18(1), 256 (2017)
Edgar, R., Domrachev, M., Lash, A.E.: Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30(1), 207–210 (2002)
Baumgartner, W., Weiß, P., Schindler, H.: A nonparametric test for the general two-sample problem. Biometrics 54, 1129–1135 (1998)
Hänzelmann, S., Castelo, R., Guinney, J.: GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinform. 14(1), 7 (2013)
Tomfohr, J., Lu, J., Kepler, T.B.: Pathway level analysis of gene expression using singular value decomposition. BMC Bioinform. 6(1), 225 (2005)
Tarca, A.L., Draghici, S., Bhatti, G., Romero, R.: Down-weighting overlapping genes improves gene set analysis. BMC Bioinform. 13, 136 (2012)
Maciejewski, H.: Gene set analysis methods: statistical models and methodological differences. Brief. Bioinform. 15(4), 504–518 (2014)
Anand, P., et al.: Cancer is a preventable disease that requires major lifestyle changes. Pharm. Res. 25(9), 2097–2116 (2008)
Micallef, L., Rodgers, P.: euler APE: drawing area-proportional 3-Venn diagrams using ellipses. PLoS ONE 9(7), e101717 (2014)
Zaravinos, A., et al.: Altered metabolic pathways in clear cell renal cell carcinoma: a meta-analysis and validation study focused on the deregulated genes and their associated networks. Oncoscience 1(2), 117 (2014)
Huang, H., et al.: Key pathways and genes controlling the development and progression of clear cell renal cell carcinoma (ccRCC) based on gene set enrichment analysis. Int. Urol. Nephrol. 46(3), 539–553 (2014)
Tun, H.W., et al.: Pathway signature and cellular differentiation in clear cell renal cell carcinoma. PLoS ONE 5(5), e10696 (2010)
Zheng, H., Guo, X., Tian, Q., Li, H., Zhu, Y.: Distinct role of Tim-3 in systemic lupus erythematosus and clear cell renal cell carcinoma. Int. J. Clin. Exp. Med. 8(5), 7029 (2015)
Morikawa, T., et al.: Identification of Toll-like receptor 3 as a potential therapeutic target in clear cell renal cell carcinoma. Clin. Cancer Res. 13(19), 5703–5709 (2007)
Acknowledgements
This work was financed by SUT grant no. BKM/506/RAU1/2016/t.26 (JZ), 02/010/BK_16/3015 (MM) and NCN grant no. 2015/19/B/ST6/01736 (JP). All calculations were carried out using GeCONiI infrastructure funded by NCBiR project no. POIG.02.03.01-24-099/13.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Zyla, J., Marczyk, M., Polanska, J. (2017). Reproducibility of Finding Enriched Gene Sets in Biological Data Analysis. In: Fdez-Riverola, F., Mohamad, M., Rocha, M., De Paz, J., Pinto, T. (eds) 11th International Conference on Practical Applications of Computational Biology & Bioinformatics. PACBB 2017. Advances in Intelligent Systems and Computing, vol 616. Springer, Cham. https://doi.org/10.1007/978-3-319-60816-7_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-60816-7_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60815-0
Online ISBN: 978-3-319-60816-7
eBook Packages: EngineeringEngineering (R0)