Abstract
Increasingly, advances in biomedical research are the result of combining and analyzing heterogeneous data types from different sources, spanning genomic, proteomic, imaging, and clinical data. Yet despite the proliferation of data-driven methods, tools to support the integration and management of large collections of data for purposes of data driven discovery are scarce, leaving scientists with ad hoc and inefficient processes. The scientific process could benefit significantly from lightweight methods for data integration that allow for exploratory, incrementally refined integration of heterogeneous data. In this paper, we address this problem by introducing a new asset management based approach designed to support continuous integration of biomedical data. We describe the system and our experiences using it in the context of several scientific applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Howe, B., Cole, G., Souroush, E., Koutris, P., Key, A., Khoussainova, N., Battle, L.: Database-as-a-Service for Long-Tail Science. In: SSDBM 2011. LNCS, vol. 6809, pp. 480–489. Springer, Heidelberg (2011)
Halevy, A., Franklin, M., Maier, D.: Principles of Dataspace Systems. In: PODS 2006, ACM, Chicago (2006)
Digital Asset Management. Wikipedia (2014)
Tunkelang, D.: Faceted Search. Synthesis Lectures on Information Concepts, Retrieval, and Services 1, 1–80 (2009)
Halevy, A., Rajaraman, A., Ordille, J.: Data integration: the teenage years. In: VLDB 2006, pp. 9–16. VLDB Endowment, Seoul (2006)
Corwin, J., et al.: Dynamic tables: An architecture for managing evolving, heterogeneous biomedical data in relational database management systems. Journal of the American 14, 86–93 (2007)
Plale, B., et al.: SEAD Virtual Archive: Building a Federation of Institutional Repositories for Long-Term Data Preservation in Sustainability Science. International Journal of Digital Curation 8, 172–180 (2013)
Hellerstein, J.M., et al.: The MADlib analytics library: or MAD skills, the SQL. In: Proceedings of the VLDB Endowment, pp. 1700–1711 (2012)
Smith, M., et al.: DSpace: An Open Source Dynamic Digital Repository. D-Lib Magazine 9 (2003)
Singh, G., et al.: A Metadata Catalog Service for Data Intensive Applica-tions. In: SuperComputing (SC 2003). ACM, Phoenix (2003)
Marcus, D.S., et al.: The Extensible Neuroimaging Archive Toolkit: an in-formatics platform for managing, exploring, and sharing neuroimaging data. Neuroinformatics 5, 11–34 (2007)
Shoshani, A., Sim, A., Gu, J.: Storage resource managers: Middleware com-ponents for grid storage. In: NASA Conference Publication, pp. 209–224 (2002)
Rajasekar, A., et al.: iRODS Primer: Integrated Rule-Oriented Data System. Synthesis Lectures on Information Concepts, Retrieval, and Services 2, 1–143 (2010)
Bittman, T.: Mind the Gap: Here Comes the Hybrid Cloud. In: Gartner Blog Network (2012)
Cattuto, C., Loreto, V., Pietronero, L.: Semiotic dynamics and collaborative tagging. Proceedings of the National Academy of Sciences 104(5), 1461–1464 (2007)
Davis, P.M., Connolly, M.J.L.: Institutional Repositories: Evaluating the Reasons for Non-use of Cornell University’s Installation of DSpace. D-Lib Magazine 13 (2007)
Greenberg, J.: Metadata Extraction and Harvesting: A Comparison of Two Automatic Metadata Generation Applications. Journal of Internet Cataloging 6, 59–82 (2004)
Lagoze, C., de Sompel, H.: The making of the open archives initiative proto-col for metadata harvesting. Library hi tech 21, 118–128 (2003)
Tuchinda, R., Szekely, P., Knoblock, C.A.: Building data integration queries by demonstration. In: Proceedings of the 12th International Conference on Intelligent User Interfaces - IUI 2007, p. 170. ACM Press, New York (2007)
Allen, B., et al.: Software as a service for data scientists. Communications of the ACMÂ 55, 81 (2012)
Ananthakrishnan, R., et al.: Globus Nexus: An identity, profile, and group management platform for science gateways and other collaborative science applications. In: 2013 IEEE International Conference on Cluster Computing (CLUSTER), pp. 1–3 (2013)
Agus, D.B., et al.: A physical sciences network characterization of non-tumorigenic and metastatic cells. Scientific Reports 3, 1449 (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Schuler, R.E., Kesselman, C., Czajkowski, K. (2014). An Asset Management Approach to Continuous Integration of Heterogeneous Biomedical Data. In: Galhardas, H., Rahm, E. (eds) Data Integration in the Life Sciences. DILS 2014. Lecture Notes in Computer Science(), vol 8574. Springer, Cham. https://doi.org/10.1007/978-3-319-08590-6_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-08590-6_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08589-0
Online ISBN: 978-3-319-08590-6
eBook Packages: Computer ScienceComputer Science (R0)