Abstract
In the data driven field of bioinformatics, data warehouses have emerged as common solutions to facilitate data analysis. The uncertainty, complexity and change rate of biological data underscore the importance of capturing its evolution. To capture information about our database’s evolution, we incorporate a temporal dimension in our data model, which we implement by means of lifespan timestamps attached to every tuple in the warehouse. This temporal information allows us to keep a full history of the warehouse and recreate any past version for purposes of auditing. Equally importantly, this information facilitates the incremental maintenance of the warehouse. We maintain the warehouse incrementally not only for relations derived by applying the standard relational operators but also for computed relations. In particular, we consider computed relations obtained through external BLAST sequence alignment computations, which are often identified as a bottleneck in the integrated warehouse maintenance process. Our experiments with subsets of protein sequences from the NCBI non-redundant database demonstrate at least 10-fold speedups for realistic target space size increases of 1% to 5%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
NCBI Entrez, http://www.ncbi.nlm.nih.gov/Entrez
GFF3 specification, http://www.sequenceontology.org/gff3.shtml
Kyoto University Bioinformatics Center, LinkDB system, http://www.genome.ad.jp/dbget/linkdb.html
EBI SRS, http://srs.ebi.ac.uk
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)
Davidson, S.B., Crabtree, J., Brunk, B.P., Schug, J., Tannen, V., Overton, G.C., Stoeckert Jr., C.G.: K2/Kleisli and GUS: Experiments in integrated access to genomic data sources. IBM Systems Journal 40 (2001)
Dyreson, C., Grandi, F., Käfer, W., Kline, N., Lorentzos, N., Mitsopoulos, Y., Montanari, A., Nonen, D., Peressi, E., Pernici, B., Roddick, J.F., Sarda, N.L., Scalas, M.R., Segev, A., Snodgrass, R.T., Soo, M.D., Tansel, A., Tiberio, P., Wiederhold, G.: A consensus glossary of temporal database concepts. SIGMOD Rec. 23, 52–64 (1994)
Griffin, T., Libkin, L.: Incremental maintenance of views with duplicates. In: SIGMOD 1995: Proceedings of the 1995 ACM SIGMOD international conference on Management of data, pp. 328–339 (1995)
Griffin, T., Libkin, L., Trickey, H.: An Improved Algorithm for the Incremental Recomputation of Active Relational Expressions. IEEE Transactions on Knowledge and Data Engineering 9, 508–511 (1997)
Gupta, A., Mumick, I.S.: Maintenance of Materialized Views: Problems, Techniques and Applications. IEEE Quarterly Bulletin on Data Engineering; Special Issue on Materialized Views and Data Warehousing 18, 3–18 (1995)
Gupta, A., Mumick, I.S., Subrahmanian, V.S.: Maintaining views incrementally. In: SIGMOD 1993: Proceedings of the 1993 ACM SIGMOD international conference on Management of data, pp. 157–166 (1993)
Inmon, W.H.: Building the Data Warehouse. John Wiley & Sons, Inc., Chichester (1992)
Karlin, S., Altschul, S.F.: Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. U S A 87, 2264–2268 (1990)
Özsoyoǧlu, G., Snodgrass, R.T.: Temporal and Real-Time Databases: A Survey. IEEE Transactions on Knowledge and Data Engineering 7, 513–532 (1995)
Paige, R., Koenig, S.: Finite Differencing of Computable Expressions. ACM Trans. Program. Lang. Syst. 4, 402–454 (1982)
Palpanas, T., Sidle, R., Cochrane, R., Pirahesh, H.: Incremental maintenance for non-distributive aggregate functions. In: VLDB (2002)
Qian, X., Wiederhold, G.: Incremental Recomputation of Active Relational Expressions. IEEE Transactions on Knowledge and Data Engineering 3, 337–341 (1991)
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197 (1981)
Snodgrass, R.T., Ahn, I.: A taxonomy of time databases. In: SIGMOD 1985: Proceedings of the 1985 ACM SIGMOD international conference on Management of data, pp. 236–246 (1985)
Szalay, A., Gray, J.: 2020 Computing: Science in an exponential world. Nature 440, 413–414 (2006)
Ullman, J.D., Garcia-Molina, H., Widom, J.: Database Systems: The Complete Book. Prentice Hall PTR (2001)
Vista, D.: Optimizing incremental view maintenance expressions in relational databases. University of Toronto (1997)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Turcu, G., Nestorov, S., Foster, I. (2008). Efficient Incremental Maintenance of Derived Relations and BLAST Computations in Bioinformatics Data Warehouses. In: Song, IY., Eder, J., Nguyen, T.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2008. Lecture Notes in Computer Science, vol 5182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85836-2_13
Download citation
DOI: https://doi.org/10.1007/978-3-540-85836-2_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85835-5
Online ISBN: 978-3-540-85836-2
eBook Packages: Computer ScienceComputer Science (R0)