Skip to main content

Reproducibility of Finding Enriched Gene Sets in Biological Data Analysis

  • Conference paper
  • First Online:
11th International Conference on Practical Applications of Computational Biology & Bioinformatics (PACBB 2017)

Abstract

Introducing the high-throughput measurement methods into molecular biology was a trigger to develop the algorithms for searching disorders in complex signalling systems, like pathways or gene ontologies. In recent years, there appeared many new solutions, but the results obtained with these techniques are ambiguous. In this work, five different algorithms for pathway enrichment analysis were compared using six microarray datasets covering cases with the same disease. The number of enriched pathways at different significance level and false positive rate of finding enrichment pathways was estimated, and reproducibility of obtained results between datasets was checked. The best performance was obtained for PLAGE method. However, taking into consideration the biological knowledge about analyzed disease condition, many findings may be false positives. Out of the other methods GSVA algorithm gave the most reproducible results across tested datasets, which was also validated in biological repositories. Similarly, good outcomes were given by GSEA method. ORA and PADOG gave poor sensitivity and reproducibility, which stand in contrary to previous research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kanehisa, M., et al.: KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44(D1), D457–D462 (2016)

    Article  Google Scholar 

  2. Ashburner, M., et al.: Gene ontology: tool for the unification of biology. Nat. Genet. 25(1), 25–29 (2000)

    Article  Google Scholar 

  3. Subramanian, A., et al.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. PNAS 102(43), 15545–15550 (2005)

    Article  Google Scholar 

  4. Van Dongen, S., Abreu-Goodger, C., Enright, A.J.: Detecting microRNA binding and siRNA off-target effects from expression data. Nat. Methods 5(12), 1023–1025 (2008)

    Article  Google Scholar 

  5. Laaksonen, R., et al.: A systems biology strategy reveals biological pathways and plasma biomarker candidates for potentially toxic statin-induced changes in muscle. PLoS ONE 1(1), e97 (2006)

    Article  Google Scholar 

  6. Beißbarth, T., Speed, T.P.: GOstat: find statistically overrepresented Gene ontologies within a group of genes. Bioinformatics 20(9), 1464–1465 (2004)

    Article  Google Scholar 

  7. Tarca, A.L., Bhatti, G., Romero, R.: A comparison of gene set analysis methods in terms of sensitivity, prioritization and specificity. PLoS ONE 8(11), e79217 (2013)

    Article  Google Scholar 

  8. Jaakkola, M.K., Elo, L.L.: Empirical comparison of structure-based pathway methods. Brief. Bioinform. 17(2), 336–345 (2016)

    Article  Google Scholar 

  9. Zyla, J., Marczyk, M., Weiner, J., Polanska, J.: Ranking metrics in gene set enrichment analysis: do they matter?. BMC Bioinform. 18(1), 256 (2017)

    Article  Google Scholar 

  10. Edgar, R., Domrachev, M., Lash, A.E.: Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30(1), 207–210 (2002)

    Article  Google Scholar 

  11. Baumgartner, W., Weiß, P., Schindler, H.: A nonparametric test for the general two-sample problem. Biometrics 54, 1129–1135 (1998)

    Article  MATH  Google Scholar 

  12. Hänzelmann, S., Castelo, R., Guinney, J.: GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinform. 14(1), 7 (2013)

    Article  Google Scholar 

  13. Tomfohr, J., Lu, J., Kepler, T.B.: Pathway level analysis of gene expression using singular value decomposition. BMC Bioinform. 6(1), 225 (2005)

    Article  Google Scholar 

  14. Tarca, A.L., Draghici, S., Bhatti, G., Romero, R.: Down-weighting overlapping genes improves gene set analysis. BMC Bioinform. 13, 136 (2012)

    Article  Google Scholar 

  15. Maciejewski, H.: Gene set analysis methods: statistical models and methodological differences. Brief. Bioinform. 15(4), 504–518 (2014)

    Article  Google Scholar 

  16. Anand, P., et al.: Cancer is a preventable disease that requires major lifestyle changes. Pharm. Res. 25(9), 2097–2116 (2008)

    Article  Google Scholar 

  17. Micallef, L., Rodgers, P.: euler APE: drawing area-proportional 3-Venn diagrams using ellipses. PLoS ONE 9(7), e101717 (2014)

    Article  Google Scholar 

  18. Zaravinos, A., et al.: Altered metabolic pathways in clear cell renal cell carcinoma: a meta-analysis and validation study focused on the deregulated genes and their associated networks. Oncoscience 1(2), 117 (2014)

    Article  Google Scholar 

  19. Huang, H., et al.: Key pathways and genes controlling the development and progression of clear cell renal cell carcinoma (ccRCC) based on gene set enrichment analysis. Int. Urol. Nephrol. 46(3), 539–553 (2014)

    Article  Google Scholar 

  20. Tun, H.W., et al.: Pathway signature and cellular differentiation in clear cell renal cell carcinoma. PLoS ONE 5(5), e10696 (2010)

    Article  Google Scholar 

  21. Zheng, H., Guo, X., Tian, Q., Li, H., Zhu, Y.: Distinct role of Tim-3 in systemic lupus erythematosus and clear cell renal cell carcinoma. Int. J. Clin. Exp. Med. 8(5), 7029 (2015)

    Google Scholar 

  22. Morikawa, T., et al.: Identification of Toll-like receptor 3 as a potential therapeutic target in clear cell renal cell carcinoma. Clin. Cancer Res. 13(19), 5703–5709 (2007)

    Article  Google Scholar 

Download references

Acknowledgements

This work was financed by SUT grant no. BKM/506/RAU1/2016/t.26 (JZ), 02/010/BK_16/3015 (MM) and NCN grant no. 2015/19/B/ST6/01736 (JP). All calculations were carried out using GeCONiI infrastructure funded by NCBiR project no. POIG.02.03.01-24-099/13.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joanna Zyla .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Zyla, J., Marczyk, M., Polanska, J. (2017). Reproducibility of Finding Enriched Gene Sets in Biological Data Analysis. In: Fdez-Riverola, F., Mohamad, M., Rocha, M., De Paz, J., Pinto, T. (eds) 11th International Conference on Practical Applications of Computational Biology & Bioinformatics. PACBB 2017. Advances in Intelligent Systems and Computing, vol 616. Springer, Cham. https://doi.org/10.1007/978-3-319-60816-7_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-60816-7_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-60815-0

  • Online ISBN: 978-3-319-60816-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics