Abstract
In the last decade, the high-throughput and relatively low cost of short-read sequencing technologies have revolutionized prokaryotic genomics. This has led to an exponential increase in the number of bacterial and archaeal genome sequences available, as well as corresponding increase of genome assembly and annotation tools developed. Together, these hardware and software technologies have given scientists unprecedented options to study their chosen microbial systems without the need for large teams of bioinformaticists or supercomputing facilities. While these analysis tools largely fall into only a few categories, each may have different requirements, caveats and file formats, and some may be rarely updated or even abandoned. And so, despite the apparent ease in sequencing and analyzing a prokaryotic genome, it is no wonder that the budding genomicist may quickly find oneself overwhelmed. Here, we aim to provide the reader with an overview of genome annotation and its most important considerations, as well as an easy-to-follow protocol to get started with annotating a prokaryotic genome.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sorokina M, Stam M, Médigue C et al (2014) Profiling the orphan enzymes. Biol Direct 9:10
Griesemer M, Kimbrel JA, Zhou CE et al (2018) Combining multiple functional annotation tools increases coverage of metabolic annotation. BMC Genomics 19:948
Baric RS, Crosson S, Damania B et al (2016) Next-generation high-throughput functional annotation of microbial genomes. MBio 7:e01245-16
Stepanauskas R (2012) Single cell genomics: an individual look at microbes. Curr Opin Microbiol 15:613–620
Bowers RM, Kyrpides NC, Stepanauskas R et al (2017) Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat Biotechnol 35:725–731
Forouzan E, Maleki MSM, Karkhane AA et al (2017) Evaluation of nine popular de novo assemblers in microbial genome assembly. J Microbiol Methods 143:32–37
Klassen JL, Currie CR (2012) Gene fragmentation in bacterial draft genomes: extent consequences and mitigation. BMC Genomics 13:14
Sohn J, Nam J-W (2016) The present and future of de novo whole-genome assembly. Brief Bioinformatics 2016:bbw096
Bowers RM, Clum A, Tice H et al (2015) Impact of library preparation protocols and template quantity on the metagenomic reconstruction of a mock microbial community. BMC Genomics 16:856
Parks DH, Imelfort M, Skennerton CT et al (2015) CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25:1043–1055
Wu M, Eisen JA (2008) A simple, fast, and accurate method of phylogenomic inference. Genome Biol 9:R151
Chain PSG, Grafham DV, Fulton RS et al (2009) Genome project standards in a new era of sequencing. Science 326:236–237
Mende DR, Letunic I, Huerta-Cepas J et al (2017) proGenomes: a resource for consistent functional and taxonomic annotations of prokaryotic genomes. Nucleic Acids Res 45:D529–D534
Gurevich A, Saveliev V, Vyahhi N et al (2013) QUAST: quality assessment tool for genome assemblies. Bioinformatics 29:1072–1075
da Veiga Leprevost F, Grüning BA, Alves AS et al (2017) BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics 33:2580–2582
Grüning B, Dale R, Sjödin A et al (2017) Bioconda: a sustainable and comprehensive software distribution for the life sciences. Nat Methods 15(7):475–476
Overmars L, Kerkhoven R, Siezen RJ et al (2013) MGcV: the microbial genomic context viewer for comparative genome analysis. BMC Genomics 14:209
Tatusova T, DiCuccio M, Badretdin A et al (2016) NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res 44:6614–6624
Chen IA, Markowitz VM, Chu K et al (2017) IMG/M: integrated genome and metagenome comparative data analysis system. Nucleic Acids Res 45:D507–D516
Aziz RK, Bartels D, Best AA et al (2008) The RAST server: rapid annotations using subsystems technology. BMC Genomics 9:75
Seemann T (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics 30:2068–2069
Van DGH, Stothard P, Shrivastava S et al (2005) BASys: a web server for automated bacterial genome annotation. Nucleic Acids Res 33:W455–W459
Kremer FS, Eslabão MR, Dellagostin OA et al (2016) Genix: a new online automated pipeline for bacterial genome annotation. FEMS Microbiol Lett 363(23):fnw263
Thakur S, Guttman DS (2016) A de-novo genome analysis pipeline (DeNoGAP) for large-scale comparative prokaryotic genomics studies. BMC Bioinformatics 17:260
Hyatt D, Chen GL, Locascio PF et al (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11:119
Delcher AL, Bratke KA, Powers EC et al (2007) Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 23:673–679
Besemer J, Lomsadze A, Borodovsky M (2001) GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res 29:2607–2618
Lowe TM, Eddy SR (1997) tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 25:955–964
Kalvari I, Argasinska J, Quinones-Olvera N et al (2018) Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families. Nucleic Acids Res 46:D335–D342
Lagesen K, Hallin P, Rødland EA et al (2007) RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res 35:3100–3108
Moll I, Grill S, Gualerzi CO et al (2002) Leaderless mRNAs in bacteria: surprises in ribosomal recruitment and translational control. Mol Microbiol 43:239–246
Zheng X, Hu GQ, She ZS et al (2011) Leaderless genes in bacteria: clue to the evolution of translation initiation mechanisms in prokaryotes. BMC Genomics 12:361
Lomsadze A, Gemayel K, Tang S et al (2017) Improved prokaryotic gene prediction yields insights into transcription and translation mechanisms on whole genome scale. https://doi.org/10.1101/193490
Borodovsky M, Rudd KE, Koonin EV (1994) Intrinsic and extrinsic approaches for detecting genes in a bacterial genome. Nucleic Acids Res 22:4756–4767
Richardson EJ, Watson M (2012) The automatic annotation of bacterial genomes. Brief Bioinform 14:1–12
Sherwood AV, Henkin TM (2016) Riboswitch-mediated gene regulation: novel RNA architectures dictate gene expression responses. Annu Rev Microbiol 70:361–374
Backofen R, Amman F, Costa F et al (2014) Bioinformatics of prokaryotic RNAs. RNA Biol 11:470–483
Kalvari I, Argasinska J, Quinones-Olvera N et al (2017) Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families. Nucleic Acids Res 46:D335–D342
Nawrocki EP, Eddy SR (2013) Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29:2933–2935
Bobrovskyy M, Vanderpool CK (2013) Regulation of bacterial metabolism by small RNAs using diverse mechanisms. Annu Rev Genet 47:209–232
Pain A, Ott A, Amine H et al (2015) An assessment of bacterial small RNA target prediction programs. RNA Biol 12:509–513
Modell JW, Jiang W, Marraffini LA (2017) CRISPR-Cas systems exploit viral DNA injection to establish and maintain adaptive immunity. Nature 544:101–104
Sallet E, Roux B, Sauviac L et al (2013) Next-generation annotation of prokaryotic genomes with EuGene-P: application to Sinorhizobium meliloti 2011. DNA Res 20:339–354
Sallet E, Gouzy J, Schiex T (2014) EuGene-PP: a next-generation automated annotation pipeline for prokaryotic genomes. Bioinformatics 30:2659–2661
Zickmann F, Lindner MS, Renard BY (2014) GIIRA–RNA-Seq driven gene finding incorporating ambiguous reads. Bioinformatics 30:606–613
Roberts A, Pimentel H, Trapnell C et al (2011) Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinformatics 27:2325–2329
Omasits U, Varadarajan AR, Schmid M et al (2017) An integrative strategy to identify the entire protein coding potential of prokaryotic genomes by proteogenomics. Genome Res 27:2083–2095
Erbilgin O, Ruebel O, Louie KB et al (2017) MAGI: a Bayesian-like method for metabolite annotation, and gene integration. ACS Chem Biol 14(4):704–714
Schiex T, Moisan A, Rouzé P (2001) Eugène: an eukaryotic gene finder that combines several sources of evidence. In: Computational biology. Springer, Berlin, pp 111–125
Tripp HJ, Sutton G, White O et al (2015) Toward a standard in structural genome annotation for prokaryotes. Stand Genomic Sci 10:45
Kanehisa M, Furumichi M, Tanabe M et al (2017) KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res 45:D353–D361
Moriya Y, Itoh M, Okuda S et al (2007) KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res 35:W182–W185
Weber T, Blin K, Duddela S et al (2015) antiSMASH 3.0-a comprehensive resource for the genome mining of biosynthetic gene clusters. Nucleic Acids Res 43:W237–W243
Yin Y, Mao X, Yang J et al (2012) dbCAN: a web resource for automated carbohydrate-active enzyme annotation. Nucleic Acids Res 40:W445–W451
Elbourne LD, Tetu SG, Hassan KA et al (2017) TransportDB 2.0: a database for exploring membrane transporters in sequenced genomes from all domains of life. Nucleic Acids Res 45:D320–D324
Chen L (2004) VFDB: a reference database for bacterial virulence factors. Nucleic Acids Res 33:D325–D328
Logan-Klumpler FJ, Silva ND, Boehme U et al (2011) GeneDB–an annotation database for pathogens. Nucleic Acids Res 40:D98–D108
Lombard V, Ramulu HG, Drula E et al (2013) The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res 42:D490–D495
Berlemont R, Martiny AC (2015) Genomic potential for polysaccharide deconstruction in bacteria. Appl Environ Microbiol 81:1513–1519
Sánchez-RodrÃguez A, Tytgat HL, Winderickx J et al (2014) A network-based approach to identify substrate classes of bacterial glycosyltransferases. BMC Genomics 15:349
Acknowledgments
This work was performed under the auspices of the U.S. Department of Energy at Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 and supported by the Genome Sciences Program of the Office of Biological and Environmental Research under the LLNL Biofuels SFA, FWP SCW1039.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Kimbrel, J.A., Jeffrey, B.M., Ward, C.S. (2022). Prokaryotic Genome Annotation. In: Navid, A. (eds) Microbial Systems Biology. Methods in Molecular Biology, vol 2349. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1585-0_10
Download citation
DOI: https://doi.org/10.1007/978-1-0716-1585-0_10
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-1584-3
Online ISBN: 978-1-0716-1585-0
eBook Packages: Springer Protocols