Skip to main content

FinIS: Improved in silico Finishing Using an Exact Quadratic Programming Formulation

  • Conference paper
Algorithms in Bioinformatics (WABI 2012)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 7534))

Included in the following conference series:

Abstract

With the increased democratization of sequencing, the reliance of sequence assembly programs on heuristics is at odds with the need for black-box assembly solutions that can be used reliably by non-specialists. In this work, we present a formal definition for in silico assembly validation and finishing and explore the feasibility of an exact solution for this problem using quadratic programming (FinIS). Based on results for several real and simulated datasets, we demonstrate that FinIS validates the correctness of a larger fraction of the assembly than existing ad hoc tools. Using a test for unique optimal solutions, we show that FinIS can improve on both precision and recall values for the correctness of assembled sequences, when compared to competing programs. Source code and executables for FinIS are freely available at http://sourceforge.net/projects/finis/ .

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Li, Y., Zheng, H., Luo, R., et al.: Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly. Nature Biotechnology 29, 6723–6730 (2011)

    Google Scholar 

  2. Birol, I., Jackman, S.D., Nielsen, C.B., et al.: De novo transcriptome assembly with ABySS. Bioinformatics 25(21), 2872–2877 (2009)

    Article  Google Scholar 

  3. Woyke, T., Teeling, H., Ivanova, N.N., et al.: Symbiosis insights through metagenomic analysis of a microbial consortium. Nature 443, 950–955 (2006)

    Article  Google Scholar 

  4. Nagarajan, N., Pop, M.: Sequencing and genome assembly using next-generation technologies. Methods in Molecular Biology 673, 1–17 (2010)

    Article  Google Scholar 

  5. Baker, M.: De novo genome assembly: what every biologist should know. Nature Methods 9, 333–337 (2012)

    Article  Google Scholar 

  6. Nagarajan, N., Pop, M.: Parametric complexity of sequence assembly: theory and applications to next generation sequencing. Journal of Computational Biology 16(7), 897–908 (2009)

    Article  MathSciNet  Google Scholar 

  7. Gao, S., Sung, W.K., Nagarajan, N.: Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences. Journal of Computational Biology 18(11), 1681–1691 (2011)

    Article  MathSciNet  Google Scholar 

  8. Pop, M., Kosack, S.D., Salzberg, S.L.: Hierarchical scaffolding with bambus. Genome Research 14, 149–159 (2004)

    Article  Google Scholar 

  9. Nagarajan, N., Read, T.D., Pop, M.: Scaffolding and validation of bacterial genome assemblies using optical restriction maps. Bioinformatics 24(10), 1229–1235 (2008)

    Article  Google Scholar 

  10. Pop, M., Phillipy, A., Delcher, A.L., Salzberg, S.L.: Comparative genome assembly. Briefings in Bioinformatics 5(3), 237–248 (2004)

    Article  Google Scholar 

  11. Nagarajan, N., Cook, C., Bonaventura, M.D., et al.: Finishing genomes with limited resources: lessons from an ensemble of microbial genomes. BMC Genomics 11(242) (2010)

    Google Scholar 

  12. Zerbino, D.R., McEwen, G.K., Marguiles, E.H., Birney, E.: Pebble and rock band: heuristic resolution of repeats and scaffolding in the velvet short-read de novo assembler. PLoS ONE 4(12) (2009)

    Google Scholar 

  13. Li, R.H., Zhu, J., Ruan, W., et al.: De novo assembly of human genomes with massively parallel short read sequencing. Genome Research 20, 265–272 (2010)

    Article  Google Scholar 

  14. Tsai, I.J., Otto, T.D., Berriman, M.: Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps. Genome Biology 11, R41 (2010)

    Article  Google Scholar 

  15. Kececioglu, J.D., Myers, E.W.: Combinatorial algorithms for DNA sequence assembly. Algorithmica 13, 7–51 (1993)

    Article  MathSciNet  Google Scholar 

  16. Pevzner, P.A., Tang, H., Waterman, M.S.: A Eularian path approach to DNA fragment assembly. Proceedings of the National Academy of Sciences 98(17), 9748–9753 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  17. Myers, E.W.: The fragment assembly string graph. Bioinformatics 21(2), 79–85 (2005)

    Article  Google Scholar 

  18. Zerbino, D., Birney, E.: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Research (2008), doi:10.1101/gr.074492.107

    Google Scholar 

  19. Karger, D., Motwani, R., Ramkumar, G.D.S.: On approximating the longest path in a graph. Algorithmica 18, 421–432 (1993)

    MathSciNet  Google Scholar 

  20. Kleinberg, J.M.: Approximation algorithms for disjoint path problems. Ph.D Thesis, Dept. of EECS. MIT (1996)

    Google Scholar 

  21. Fleischner, H.: Algorithms for Eulerian Trails, Eulerian Graphs and Related Topics. Annals of Discrete Mathematics, Part 1 2(50), X.1C13 (1991)

    Google Scholar 

  22. Kingsford, C., Schatz, M.C., Pop, M.: Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics 11(21) (2010)

    Google Scholar 

  23. Richter, D.C., Ott, F., Schmid, R., Huson, D.H.: Metasim: a sequencing simulator for genomics and metagenomics. PloS One 3(10) (2008)

    Google Scholar 

  24. Kelley, D.R., Schatz, M.C., Salzberg, S.L.: Quake: quality-aware detection and correction of sequencing errors. Genome Biology 11, R116 (2010)

    Article  Google Scholar 

  25. Kurtz, S.A., Phillippy, A., Delcher, A.L., et al.: Versatile and open software for comparing large genomes. Genome Biology 5, R12 (2004)

    Article  Google Scholar 

  26. Katoh, K., Misawa, K., Kuma, K., Miyata, T.: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research 30(14) (2002)

    Google Scholar 

  27. Jarrod, A.C., Isaac, H., Sirisha, S., Shujun, L., Gary, P.S., Daniel, S.R.: Meraculous: De Novo Genome Assembly with Short Paired-End Reads. PLoS ONE 6(8), e23501 (2011), doi:10.1371/journal.pone.0023501

    Article  Google Scholar 

  28. Vandenberghe, L., Boyd, S.: Semidefinite Programming. SIAM Review 38, 49–95 (1996)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gao, S., Bertrand, D., Nagarajan, N. (2012). FinIS: Improved in silico Finishing Using an Exact Quadratic Programming Formulation. In: Raphael, B., Tang, J. (eds) Algorithms in Bioinformatics. WABI 2012. Lecture Notes in Computer Science(), vol 7534. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33122-0_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33122-0_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33121-3

  • Online ISBN: 978-3-642-33122-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics