Skip to main content

Prokaryotic Genome Annotation

  • Protocol
  • First Online:
Microbial Systems Biology

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2349))

Abstract

In the last decade, the high-throughput and relatively low cost of short-read sequencing technologies have revolutionized prokaryotic genomics. This has led to an exponential increase in the number of bacterial and archaeal genome sequences available, as well as corresponding increase of genome assembly and annotation tools developed. Together, these hardware and software technologies have given scientists unprecedented options to study their chosen microbial systems without the need for large teams of bioinformaticists or supercomputing facilities. While these analysis tools largely fall into only a few categories, each may have different requirements, caveats and file formats, and some may be rarely updated or even abandoned. And so, despite the apparent ease in sequencing and analyzing a prokaryotic genome, it is no wonder that the budding genomicist may quickly find oneself overwhelmed. Here, we aim to provide the reader with an overview of genome annotation and its most important considerations, as well as an easy-to-follow protocol to get started with annotating a prokaryotic genome.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sorokina M, Stam M, Médigue C et al (2014) Profiling the orphan enzymes. Biol Direct 9:10

    Article  PubMed  PubMed Central  Google Scholar 

  2. Griesemer M, Kimbrel JA, Zhou CE et al (2018) Combining multiple functional annotation tools increases coverage of metabolic annotation. BMC Genomics 19:948

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Baric RS, Crosson S, Damania B et al (2016) Next-generation high-throughput functional annotation of microbial genomes. MBio 7:e01245-16

    Article  PubMed  PubMed Central  Google Scholar 

  4. Stepanauskas R (2012) Single cell genomics: an individual look at microbes. Curr Opin Microbiol 15:613–620

    Article  CAS  PubMed  Google Scholar 

  5. Bowers RM, Kyrpides NC, Stepanauskas R et al (2017) Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat Biotechnol 35:725–731

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Forouzan E, Maleki MSM, Karkhane AA et al (2017) Evaluation of nine popular de novo assemblers in microbial genome assembly. J Microbiol Methods 143:32–37

    Article  CAS  PubMed  Google Scholar 

  7. Klassen JL, Currie CR (2012) Gene fragmentation in bacterial draft genomes: extent consequences and mitigation. BMC Genomics 13:14

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Sohn J, Nam J-W (2016) The present and future of de novo whole-genome assembly. Brief Bioinformatics 2016:bbw096

    Article  CAS  Google Scholar 

  9. Bowers RM, Clum A, Tice H et al (2015) Impact of library preparation protocols and template quantity on the metagenomic reconstruction of a mock microbial community. BMC Genomics 16:856

    Article  PubMed  PubMed Central  Google Scholar 

  10. Parks DH, Imelfort M, Skennerton CT et al (2015) CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25:1043–1055

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Wu M, Eisen JA (2008) A simple, fast, and accurate method of phylogenomic inference. Genome Biol 9:R151

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  12. Chain PSG, Grafham DV, Fulton RS et al (2009) Genome project standards in a new era of sequencing. Science 326:236–237

    Article  CAS  PubMed  Google Scholar 

  13. Mende DR, Letunic I, Huerta-Cepas J et al (2017) proGenomes: a resource for consistent functional and taxonomic annotations of prokaryotic genomes. Nucleic Acids Res 45:D529–D534

    Article  CAS  PubMed  Google Scholar 

  14. Gurevich A, Saveliev V, Vyahhi N et al (2013) QUAST: quality assessment tool for genome assemblies. Bioinformatics 29:1072–1075

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. da Veiga Leprevost F, Grüning BA, Alves AS et al (2017) BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics 33:2580–2582

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  16. Grüning B, Dale R, Sjödin A et al (2017) Bioconda: a sustainable and comprehensive software distribution for the life sciences. Nat Methods 15(7):475–476

    Article  CAS  Google Scholar 

  17. Overmars L, Kerkhoven R, Siezen RJ et al (2013) MGcV: the microbial genomic context viewer for comparative genome analysis. BMC Genomics 14:209

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Tatusova T, DiCuccio M, Badretdin A et al (2016) NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res 44:6614–6624

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Chen IA, Markowitz VM, Chu K et al (2017) IMG/M: integrated genome and metagenome comparative data analysis system. Nucleic Acids Res 45:D507–D516

    Article  CAS  PubMed  Google Scholar 

  20. Aziz RK, Bartels D, Best AA et al (2008) The RAST server: rapid annotations using subsystems technology. BMC Genomics 9:75

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  21. Seemann T (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics 30:2068–2069

    Article  CAS  PubMed  Google Scholar 

  22. Van DGH, Stothard P, Shrivastava S et al (2005) BASys: a web server for automated bacterial genome annotation. Nucleic Acids Res 33:W455–W459

    Article  CAS  Google Scholar 

  23. Kremer FS, Eslabão MR, Dellagostin OA et al (2016) Genix: a new online automated pipeline for bacterial genome annotation. FEMS Microbiol Lett 363(23):fnw263

    Article  PubMed  CAS  Google Scholar 

  24. Thakur S, Guttman DS (2016) A de-novo genome analysis pipeline (DeNoGAP) for large-scale comparative prokaryotic genomics studies. BMC Bioinformatics 17:260

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  25. Hyatt D, Chen GL, Locascio PF et al (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11:119

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  26. Delcher AL, Bratke KA, Powers EC et al (2007) Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 23:673–679

    Article  CAS  PubMed  Google Scholar 

  27. Besemer J, Lomsadze A, Borodovsky M (2001) GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res 29:2607–2618

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Lowe TM, Eddy SR (1997) tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 25:955–964

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Kalvari I, Argasinska J, Quinones-Olvera N et al (2018) Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families. Nucleic Acids Res 46:D335–D342

    Article  CAS  PubMed  Google Scholar 

  30. Lagesen K, Hallin P, Rødland EA et al (2007) RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res 35:3100–3108

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Moll I, Grill S, Gualerzi CO et al (2002) Leaderless mRNAs in bacteria: surprises in ribosomal recruitment and translational control. Mol Microbiol 43:239–246

    Article  CAS  PubMed  Google Scholar 

  32. Zheng X, Hu GQ, She ZS et al (2011) Leaderless genes in bacteria: clue to the evolution of translation initiation mechanisms in prokaryotes. BMC Genomics 12:361

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Lomsadze A, Gemayel K, Tang S et al (2017) Improved prokaryotic gene prediction yields insights into transcription and translation mechanisms on whole genome scale. https://doi.org/10.1101/193490

  34. Borodovsky M, Rudd KE, Koonin EV (1994) Intrinsic and extrinsic approaches for detecting genes in a bacterial genome. Nucleic Acids Res 22:4756–4767

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Richardson EJ, Watson M (2012) The automatic annotation of bacterial genomes. Brief Bioinform 14:1–12

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  36. Sherwood AV, Henkin TM (2016) Riboswitch-mediated gene regulation: novel RNA architectures dictate gene expression responses. Annu Rev Microbiol 70:361–374

    Article  CAS  PubMed  Google Scholar 

  37. Backofen R, Amman F, Costa F et al (2014) Bioinformatics of prokaryotic RNAs. RNA Biol 11:470–483

    Article  PubMed  PubMed Central  Google Scholar 

  38. Kalvari I, Argasinska J, Quinones-Olvera N et al (2017) Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families. Nucleic Acids Res 46:D335–D342

    Article  PubMed Central  CAS  Google Scholar 

  39. Nawrocki EP, Eddy SR (2013) Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29:2933–2935

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Bobrovskyy M, Vanderpool CK (2013) Regulation of bacterial metabolism by small RNAs using diverse mechanisms. Annu Rev Genet 47:209–232

    Article  CAS  PubMed  Google Scholar 

  41. Pain A, Ott A, Amine H et al (2015) An assessment of bacterial small RNA target prediction programs. RNA Biol 12:509–513

    Article  PubMed  PubMed Central  Google Scholar 

  42. Modell JW, Jiang W, Marraffini LA (2017) CRISPR-Cas systems exploit viral DNA injection to establish and maintain adaptive immunity. Nature 544:101–104

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Sallet E, Roux B, Sauviac L et al (2013) Next-generation annotation of prokaryotic genomes with EuGene-P: application to Sinorhizobium meliloti 2011. DNA Res 20:339–354

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Sallet E, Gouzy J, Schiex T (2014) EuGene-PP: a next-generation automated annotation pipeline for prokaryotic genomes. Bioinformatics 30:2659–2661

    Article  CAS  PubMed  Google Scholar 

  45. Zickmann F, Lindner MS, Renard BY (2014) GIIRA–RNA-Seq driven gene finding incorporating ambiguous reads. Bioinformatics 30:606–613

    Article  CAS  PubMed  Google Scholar 

  46. Roberts A, Pimentel H, Trapnell C et al (2011) Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinformatics 27:2325–2329

    Article  CAS  PubMed  Google Scholar 

  47. Omasits U, Varadarajan AR, Schmid M et al (2017) An integrative strategy to identify the entire protein coding potential of prokaryotic genomes by proteogenomics. Genome Res 27:2083–2095

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Erbilgin O, Ruebel O, Louie KB et al (2017) MAGI: a Bayesian-like method for metabolite annotation, and gene integration. ACS Chem Biol 14(4):704–714

    Article  CAS  Google Scholar 

  49. Schiex T, Moisan A, Rouzé P (2001) Eugène: an eukaryotic gene finder that combines several sources of evidence. In: Computational biology. Springer, Berlin, pp 111–125

    Chapter  Google Scholar 

  50. Tripp HJ, Sutton G, White O et al (2015) Toward a standard in structural genome annotation for prokaryotes. Stand Genomic Sci 10:45

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  51. Kanehisa M, Furumichi M, Tanabe M et al (2017) KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res 45:D353–D361

    Article  CAS  PubMed  Google Scholar 

  52. Moriya Y, Itoh M, Okuda S et al (2007) KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res 35:W182–W185

    Article  PubMed  PubMed Central  Google Scholar 

  53. Weber T, Blin K, Duddela S et al (2015) antiSMASH 3.0-a comprehensive resource for the genome mining of biosynthetic gene clusters. Nucleic Acids Res 43:W237–W243

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Yin Y, Mao X, Yang J et al (2012) dbCAN: a web resource for automated carbohydrate-active enzyme annotation. Nucleic Acids Res 40:W445–W451

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Elbourne LD, Tetu SG, Hassan KA et al (2017) TransportDB 2.0: a database for exploring membrane transporters in sequenced genomes from all domains of life. Nucleic Acids Res 45:D320–D324

    Article  CAS  PubMed  Google Scholar 

  56. Chen L (2004) VFDB: a reference database for bacterial virulence factors. Nucleic Acids Res 33:D325–D328

    Article  PubMed Central  CAS  Google Scholar 

  57. Logan-Klumpler FJ, Silva ND, Boehme U et al (2011) GeneDB–an annotation database for pathogens. Nucleic Acids Res 40:D98–D108

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  58. Lombard V, Ramulu HG, Drula E et al (2013) The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res 42:D490–D495

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  59. Berlemont R, Martiny AC (2015) Genomic potential for polysaccharide deconstruction in bacteria. Appl Environ Microbiol 81:1513–1519

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  60. Sánchez-Rodríguez A, Tytgat HL, Winderickx J et al (2014) A network-based approach to identify substrate classes of bacterial glycosyltransferases. BMC Genomics 15:349

    Article  PubMed  PubMed Central  CAS  Google Scholar 

Download references

Acknowledgments

This work was performed under the auspices of the U.S. Department of Energy at Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 and supported by the Genome Sciences Program of the Office of Biological and Environmental Research under the LLNL Biofuels SFA, FWP SCW1039.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jeffrey A. Kimbrel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Kimbrel, J.A., Jeffrey, B.M., Ward, C.S. (2022). Prokaryotic Genome Annotation. In: Navid, A. (eds) Microbial Systems Biology. Methods in Molecular Biology, vol 2349. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1585-0_10

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-1585-0_10

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-1584-3

  • Online ISBN: 978-1-0716-1585-0

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics