Smart Persistence and Accessibility of Genomic and Clinical Data

Cappelli, Eleonora; Weitschek, Emanuel; Cumbo, Fabio

doi:10.1007/978-3-030-27684-3_2

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1062))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

580 Accesses
1 Citations

Abstract

The continuous growth of experimental data generated by Next Generation Sequencing (NGS) machines has led to the adoption of advanced techniques to intelligently manage them. The advent of the Big Data era posed new challenges that led to the development of novel methods and tools, which were initially born to face with computational science problems, but which nowadays can be widely applied on biomedical data. In this work, we address two biomedical data management issues: (i) how to reduce the redundancy of genomic and clinical data, and (ii) how to make this big amount of data easily accessible. Firstly, we propose an approach to optimally organize genomic and clinical data by taking into account data redundancy and propose a method able to save as much space as possible by exploiting the power of no-SQL technologies. Then, we propose design principles for organizing biomedical data and make them easily accessible through the development of a collection of Application Programming Interfaces (APIs), in order to provide a flexible framework that we called OpenOmics. To prove the validity of our approach, we apply it on data extracted from The Genomic Data Commons repository. OpenOmics is free and open source for allowing everyone to extend the set of provided APIs with new features that may be able to answer specific biological questions. They are hosted on GitHub at the following address https://github.com/fabio-cumbo/open-omics-api/, publicly queryable at http://bioinformatics.iasi.cnr.it/openomics/api/routes, and their documentation is available at https://openomics.docs.apiary.io/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Stenson, P.D., et al.: The human gene mutation database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies. Hum. Genet. 136(6), 665–677 (2017)
Article Google Scholar
Barrett, T., et al.: NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res. 37(Suppl. 1), D885–D890 (2008)
Google Scholar
Benson, D.A., Clark, K., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Sayers, E.W.: GenBank. Nucleic Acids Res. 42(D1), D32–D37 (2013)
Article Google Scholar
Chen, Q., Zobel, J., and Verspoor, K.: Duplicates, redundancies and inconsistencies in the primary nucleotide databases: a descriptive study. In: Database 2017, baw163 (2017)
Google Scholar
Cumbo, F., Fiscon, G., Ceri, S., Masseroli, M., Weitschek, E.: TCGA2BED: extracting, extending, integrating, and querying The Cancer Genome Atlas. BMC Bioinform. 18(1), 6 (2017)
Article Google Scholar
Cappelli, E., Cumbo, F., Bernasconi, A., Masseroli, M., Weitschek, E.: OpenGDC: standardizing, extending, and integrating genomics data of cancer. In ESCS 2018: 8th European Student Council Symposium, International Society for Computational Biology (ISCB), p. 1 (2018)
Google Scholar
Weinstein, J.N., et al.: The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45(10), 1113 (2013)
Article Google Scholar
Jensen, M.A., Ferretti, V., Grossman, R.L., Staudt, L.M.: The NCI genomic data commons as an engine for precision medicine. Blood 130(4), 453–459 (2017)
Article Google Scholar
Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L., Wold, B.: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5(7), 621 (2008)
Article Google Scholar
Bibikova, M., et al.: High density DNA methylation array with single CpG site resolution. Genomics 98(4), 288–295 (2011)
Article Google Scholar
Trapnell, C., et al.: Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28(5), 511 (2010)
Article Google Scholar
Zeng, Y., Cullen, B.R.: Sequence requirements for micro RNA processing and function in human cells. RNA 9(1), 112–123 (2003)
Article Google Scholar
Timmermann, B., et al.: Somatic mutation profiles of MSI and MSS colorectal cancer identified by whole exome next generation sequencing and bioinformatics analysis. PLoS ONE 5(12), e15661 (2010)
Article Google Scholar
Conrad, D.F., et al.: Origins and functional impact of copy number variation in the human genome. Nature 464(7289), 704 (2010)
Article Google Scholar
Cumbo, F., Weitschek, E., Bertolazzi, P., Felici, G.: IRIS-TCGA: an information retrieval and integration system for genomic data of cancer. In: Bracciali, A., Caravagna, G., Gilbert, D., Tagliaferri, R. (eds.) CIBB 2016. LNCS, vol. 10477, pp. 160–171. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67834-4_13
Chapter Google Scholar
Cumbo, F., Felici, G.: GDCWebApp: filtering, extracting, and converting genomic and clinical data from the Genomic Data Commons portal. In: Genome Informatics, Cold Spring Harbor Meeting (2017)
Google Scholar
Weitschek, E., Cumbo, F., Cappelli, E., Felici, G.: Genomic data integration: a case study on next generation sequencing of cancer. In: International Workshop on Database and Expert Systems Applications, pp. 49–53, IEEE Computer Society, Los Alamitos (2016)
Google Scholar
Weitschek, E., Cumbo, F., Cappelli, E., Felici, G., Bertolazzi, P.: Classifying big DNA methylation data: a gene-oriented approach. In: Elloumi, M., et al. (eds.) DEXA 2018. CCIS, vol. 903, pp. 138–149. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99133-7_11
Chapter Google Scholar
Cappelli, E., Felici, G., Weitschek, E.: Combining DNA methylation and RNA sequencing data of cancer for supervised knowledge extraction. BioData Min. 11(1), 22 (2018)
Article Google Scholar
Weitschek, E., Di Lauro, S., Cappelli, E., Bertolazzi, P., Felici, G.: CamurWeb: a classification software and a large knowledge base for gene expression data of cancer. BMC Bioinform. 19(10), 245 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Engineering, Roma Tre University, 00146, Rome, Italy
Eleonora Cappelli
Department of Engineering, Uninettuno University, 00186, Rome, Italy
Emanuel Weitschek
CIBIO Department, University of Trento, 38123, Trento, Italy
Fabio Cumbo

Authors

Eleonora Cappelli
View author publications
You can also search for this author in PubMed Google Scholar
Emanuel Weitschek
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Cumbo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Emanuel Weitschek .

Editor information

Editors and Affiliations

Institute of Telecooperation, Johannes Kepler University of Linz, Linz, Oberösterreich, Austria
Gabriele Anderst-Kotsis
Software Competence Center Hagenberg, Hagenberg, Austria
A Min Tjoa
Institute of Telecooperation, Johannes Kepler University of Linz, Linz, Oberösterreich, Austria
Ismail Khalil
ENSIT, LaTICE, University of Tunis, Tunis, Tunisia
Mourad Elloumi
Software Competence Center, Hagenberg, Austria
Atif Mashkoor
Steyregg, Oberösterreich, Austria
Johannes Sametinger
Edificio 204, ICT Division,TECNALIA, Derio, Vizcaya, Spain
Xabier Larrucea
Top 74, Innsbruck, Tirol, Austria
Anna Fensel
Hagenberg Gmbh, Software Competence Center, Hagenberg im Mühlkreis, Oberösterreich, Austria
Jorge Martinez-Gil
Software Competence Center Hagenberg, SC, Hagenberg im Mühlkreis, Oberösterreich, Austria
Bernhard Moser
University of Twente, ENSCHEDE, Overijssel, The Netherlands
Christin Seifert
Bauhaus Universität Weimar, Weimar, Thüringen, Germany
Benno Stein
MiCS, Media Computer Science, University of Passau, Passau, Bayern, Germany
Michael Granitzer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cappelli, E., Weitschek, E., Cumbo, F. (2019). Smart Persistence and Accessibility of Genomic and Clinical Data. In: Anderst-Kotsis, G., et al. Database and Expert Systems Applications. DEXA 2019. Communications in Computer and Information Science, vol 1062. Springer, Cham. https://doi.org/10.1007/978-3-030-27684-3_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-27684-3_2
Published: 01 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27683-6
Online ISBN: 978-3-030-27684-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics