Skip to main content

Efficient Incremental Maintenance of Derived Relations and BLAST Computations in Bioinformatics Data Warehouses

  • Conference paper
Data Warehousing and Knowledge Discovery (DaWaK 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5182))

Included in the following conference series:

Abstract

In the data driven field of bioinformatics, data warehouses have emerged as common solutions to facilitate data analysis. The uncertainty, complexity and change rate of biological data underscore the importance of capturing its evolution. To capture information about our database’s evolution, we incorporate a temporal dimension in our data model, which we implement by means of lifespan timestamps attached to every tuple in the warehouse. This temporal information allows us to keep a full history of the warehouse and recreate any past version for purposes of auditing. Equally importantly, this information facilitates the incremental maintenance of the warehouse. We maintain the warehouse incrementally not only for relations derived by applying the standard relational operators but also for computed relations. In particular, we consider computed relations obtained through external BLAST sequence alignment computations, which are often identified as a bottleneck in the integrated warehouse maintenance process. Our experiments with subsets of protein sequences from the NCBI non-redundant database demonstrate at least 10-fold speedups for realistic target space size increases of 1% to 5%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. NCBI Entrez, http://www.ncbi.nlm.nih.gov/Entrez

  2. GFF3 specification, http://www.sequenceontology.org/gff3.shtml

  3. Kyoto University Bioinformatics Center, LinkDB system, http://www.genome.ad.jp/dbget/linkdb.html

  4. EBI SRS, http://srs.ebi.ac.uk

  5. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)

    Google Scholar 

  6. Davidson, S.B., Crabtree, J., Brunk, B.P., Schug, J., Tannen, V., Overton, G.C., Stoeckert Jr., C.G.: K2/Kleisli and GUS: Experiments in integrated access to genomic data sources. IBM Systems Journal 40 (2001)

    Google Scholar 

  7. Dyreson, C., Grandi, F., Käfer, W., Kline, N., Lorentzos, N., Mitsopoulos, Y., Montanari, A., Nonen, D., Peressi, E., Pernici, B., Roddick, J.F., Sarda, N.L., Scalas, M.R., Segev, A., Snodgrass, R.T., Soo, M.D., Tansel, A., Tiberio, P., Wiederhold, G.: A consensus glossary of temporal database concepts. SIGMOD Rec. 23, 52–64 (1994)

    Article  Google Scholar 

  8. Griffin, T., Libkin, L.: Incremental maintenance of views with duplicates. In: SIGMOD 1995: Proceedings of the 1995 ACM SIGMOD international conference on Management of data, pp. 328–339 (1995)

    Google Scholar 

  9. Griffin, T., Libkin, L., Trickey, H.: An Improved Algorithm for the Incremental Recomputation of Active Relational Expressions. IEEE Transactions on Knowledge and Data Engineering 9, 508–511 (1997)

    Article  Google Scholar 

  10. Gupta, A., Mumick, I.S.: Maintenance of Materialized Views: Problems, Techniques and Applications. IEEE Quarterly Bulletin on Data Engineering; Special Issue on Materialized Views and Data Warehousing 18, 3–18 (1995)

    Google Scholar 

  11. Gupta, A., Mumick, I.S., Subrahmanian, V.S.: Maintaining views incrementally. In: SIGMOD 1993: Proceedings of the 1993 ACM SIGMOD international conference on Management of data, pp. 157–166 (1993)

    Google Scholar 

  12. Inmon, W.H.: Building the Data Warehouse. John Wiley & Sons, Inc., Chichester (1992)

    Google Scholar 

  13. Karlin, S., Altschul, S.F.: Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. U S A 87, 2264–2268 (1990)

    Article  MATH  Google Scholar 

  14. Özsoyoǧlu, G., Snodgrass, R.T.: Temporal and Real-Time Databases: A Survey. IEEE Transactions on Knowledge and Data Engineering 7, 513–532 (1995)

    Article  Google Scholar 

  15. Paige, R., Koenig, S.: Finite Differencing of Computable Expressions. ACM Trans. Program. Lang. Syst. 4, 402–454 (1982)

    Article  MATH  Google Scholar 

  16. Palpanas, T., Sidle, R., Cochrane, R., Pirahesh, H.: Incremental maintenance for non-distributive aggregate functions. In: VLDB (2002)

    Google Scholar 

  17. Qian, X., Wiederhold, G.: Incremental Recomputation of Active Relational Expressions. IEEE Transactions on Knowledge and Data Engineering 3, 337–341 (1991)

    Article  Google Scholar 

  18. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197 (1981)

    Article  Google Scholar 

  19. Snodgrass, R.T., Ahn, I.: A taxonomy of time databases. In: SIGMOD 1985: Proceedings of the 1985 ACM SIGMOD international conference on Management of data, pp. 236–246 (1985)

    Google Scholar 

  20. Szalay, A., Gray, J.: 2020 Computing: Science in an exponential world. Nature 440, 413–414 (2006)

    Article  Google Scholar 

  21. Ullman, J.D., Garcia-Molina, H., Widom, J.: Database Systems: The Complete Book. Prentice Hall PTR (2001)

    Google Scholar 

  22. Vista, D.: Optimizing incremental view maintenance expressions in relational databases. University of Toronto (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Il-Yeol Song Johann Eder Tho Manh Nguyen

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Turcu, G., Nestorov, S., Foster, I. (2008). Efficient Incremental Maintenance of Derived Relations and BLAST Computations in Bioinformatics Data Warehouses. In: Song, IY., Eder, J., Nguyen, T.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2008. Lecture Notes in Computer Science, vol 5182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85836-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85836-2_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85835-5

  • Online ISBN: 978-3-540-85836-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics