Skip to main content

Smart Persistence and Accessibility of Genomic and Clinical Data

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 2019)

Abstract

The continuous growth of experimental data generated by Next Generation Sequencing (NGS) machines has led to the adoption of advanced techniques to intelligently manage them. The advent of the Big Data era posed new challenges that led to the development of novel methods and tools, which were initially born to face with computational science problems, but which nowadays can be widely applied on biomedical data. In this work, we address two biomedical data management issues: (i) how to reduce the redundancy of genomic and clinical data, and (ii) how to make this big amount of data easily accessible. Firstly, we propose an approach to optimally organize genomic and clinical data by taking into account data redundancy and propose a method able to save as much space as possible by exploiting the power of no-SQL technologies. Then, we propose design principles for organizing biomedical data and make them easily accessible through the development of a collection of Application Programming Interfaces (APIs), in order to provide a flexible framework that we called OpenOmics. To prove the validity of our approach, we apply it on data extracted from The Genomic Data Commons repository. OpenOmics is free and open source for allowing everyone to extend the set of provided APIs with new features that may be able to answer specific biological questions. They are hosted on GitHub at the following address https://github.com/fabio-cumbo/open-omics-api/, publicly queryable at http://bioinformatics.iasi.cnr.it/openomics/api/routes, and their documentation is available at https://openomics.docs.apiary.io/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Stenson, P.D., et al.: The human gene mutation database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies. Hum. Genet. 136(6), 665–677 (2017)

    Article  Google Scholar 

  2. Barrett, T., et al.: NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res. 37(Suppl. 1), D885–D890 (2008)

    Google Scholar 

  3. Benson, D.A., Clark, K., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Sayers, E.W.: GenBank. Nucleic Acids Res. 42(D1), D32–D37 (2013)

    Article  Google Scholar 

  4. Chen, Q., Zobel, J., and Verspoor, K.: Duplicates, redundancies and inconsistencies in the primary nucleotide databases: a descriptive study. In: Database 2017, baw163 (2017)

    Google Scholar 

  5. Cumbo, F., Fiscon, G., Ceri, S., Masseroli, M., Weitschek, E.: TCGA2BED: extracting, extending, integrating, and querying The Cancer Genome Atlas. BMC Bioinform. 18(1), 6 (2017)

    Article  Google Scholar 

  6. Cappelli, E., Cumbo, F., Bernasconi, A., Masseroli, M., Weitschek, E.: OpenGDC: standardizing, extending, and integrating genomics data of cancer. In ESCS 2018: 8th European Student Council Symposium, International Society for Computational Biology (ISCB), p. 1 (2018)

    Google Scholar 

  7. Weinstein, J.N., et al.: The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45(10), 1113 (2013)

    Article  Google Scholar 

  8. Jensen, M.A., Ferretti, V., Grossman, R.L., Staudt, L.M.: The NCI genomic data commons as an engine for precision medicine. Blood 130(4), 453–459 (2017)

    Article  Google Scholar 

  9. Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L., Wold, B.: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5(7), 621 (2008)

    Article  Google Scholar 

  10. Bibikova, M., et al.: High density DNA methylation array with single CpG site resolution. Genomics 98(4), 288–295 (2011)

    Article  Google Scholar 

  11. Trapnell, C., et al.: Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28(5), 511 (2010)

    Article  Google Scholar 

  12. Zeng, Y., Cullen, B.R.: Sequence requirements for micro RNA processing and function in human cells. RNA 9(1), 112–123 (2003)

    Article  Google Scholar 

  13. Timmermann, B., et al.: Somatic mutation profiles of MSI and MSS colorectal cancer identified by whole exome next generation sequencing and bioinformatics analysis. PLoS ONE 5(12), e15661 (2010)

    Article  Google Scholar 

  14. Conrad, D.F., et al.: Origins and functional impact of copy number variation in the human genome. Nature 464(7289), 704 (2010)

    Article  Google Scholar 

  15. Cumbo, F., Weitschek, E., Bertolazzi, P., Felici, G.: IRIS-TCGA: an information retrieval and integration system for genomic data of cancer. In: Bracciali, A., Caravagna, G., Gilbert, D., Tagliaferri, R. (eds.) CIBB 2016. LNCS, vol. 10477, pp. 160–171. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67834-4_13

    Chapter  Google Scholar 

  16. Cumbo, F., Felici, G.: GDCWebApp: filtering, extracting, and converting genomic and clinical data from the Genomic Data Commons portal. In: Genome Informatics, Cold Spring Harbor Meeting (2017)

    Google Scholar 

  17. Weitschek, E., Cumbo, F., Cappelli, E., Felici, G.: Genomic data integration: a case study on next generation sequencing of cancer. In: International Workshop on Database and Expert Systems Applications, pp. 49–53, IEEE Computer Society, Los Alamitos (2016)

    Google Scholar 

  18. Weitschek, E., Cumbo, F., Cappelli, E., Felici, G., Bertolazzi, P.: Classifying big DNA methylation data: a gene-oriented approach. In: Elloumi, M., et al. (eds.) DEXA 2018. CCIS, vol. 903, pp. 138–149. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99133-7_11

    Chapter  Google Scholar 

  19. Cappelli, E., Felici, G., Weitschek, E.: Combining DNA methylation and RNA sequencing data of cancer for supervised knowledge extraction. BioData Min. 11(1), 22 (2018)

    Article  Google Scholar 

  20. Weitschek, E., Di Lauro, S., Cappelli, E., Bertolazzi, P., Felici, G.: CamurWeb: a classification software and a large knowledge base for gene expression data of cancer. BMC Bioinform. 19(10), 245 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Emanuel Weitschek .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cappelli, E., Weitschek, E., Cumbo, F. (2019). Smart Persistence and Accessibility of Genomic and Clinical Data. In: Anderst-Kotsis, G., et al. Database and Expert Systems Applications. DEXA 2019. Communications in Computer and Information Science, vol 1062. Springer, Cham. https://doi.org/10.1007/978-3-030-27684-3_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-27684-3_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-27683-6

  • Online ISBN: 978-3-030-27684-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics