Abstract
Multiple sequence alignment (MSA) is a fundamental component in many DNA sequence analyses including metagenomics studies and phylogeny inference. When guided by protein profiles, DNA multiple alignments assume a higher precision and robustness. Here we present details of the use of the upgraded version of MSA-PAD (2.0), which is a DNA multiple sequence alignment framework able to align DNA sequences coding for single/multiple protein domains guided by PFAM or user-defined annotations. MSA-PAD has two alignment strategies, called “Gene” and “Genome,” accounting for coding domains order and genomic rearrangements, respectively. Novel options were added to the present version, where the MSA can be guided by protein profiles provided by the user. This allows MSA-PAD 2.0 to run faster and to add custom protein profiles sometimes not present in PFAM database according to the user’s interest. MSA-PAD 2.0 is currently freely available as a Web application at https://recasgateway.cloud.ba.infn.it/.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA, Oakley BB, Parks DH, Robinson CJ, Sahl JW, Stres B, Thallinger GG, Van Horn DJ, Weber CF (2009) Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 75(23):7537–7541. https://doi.org/10.1128/AEM.01541-09
Matsen FA, Kodner RB, Armbrust EV (2010) pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics 11:538. https://doi.org/10.1186/1471-2105-11-538
Balech B, Vicario S, Donvito G, Monaco A, Notarangelo P, Pesole G (2015) MSA-PAD: DNA multiple sequence alignment framework based on PFAM accessed domain information. Bioinformatics 31(15):2571–2573. https://doi.org/10.1093/bioinformatics/btv141
Yang XF, Peng JJ, Liang HR, Yang YT, Wang YF, Wu XW, Pan JJ, Luo YW, Guo XF (2014) Gene order rearrangement of the M gene in the rabies virus leads to slower replication. Virusdisease 25(3):365–371. https://doi.org/10.1007/s13337-014-0220-1
Flanagan EB, Zamparo JM, Ball LA, Rodriguez LL, Wertz GW (2001) Rearrangement of the genes of vesicular stomatitis virus eliminates clinical disease in the natural host: new strategy for vaccine development. J Virol 75(13):6107–6114. https://doi.org/10.1128/JVI.75.13.6107-6114.2001
D’Onorio de Meo P, D’Antonio M, Griggio F, Lupi R, Borsani M, Pavesi G, Castrignano T, Pesole G, Gissi C (2012) MitoZoa 2.0: a database resource and search tools for comparative and evolutionary analyses of mitochondrial genomes in Metazoa. Nucleic Acids Res 40(Database issue):D1168–D1172. https://doi.org/10.1093/nar/gkr1144
Gai Y, Song D, Sun H, Yang Q, Zhou K (2008) The complete mitochondrial genome of Symphylella sp. (Myriapoda: Symphyla): extensive gene order rearrangement and evidence in favor of Progoneata. Mol Phylogenet Evol 49(2):574–585. https://doi.org/10.1016/j.ympev.2008.08.010
Edgar RC (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:113. https://doi.org/10.1186/1471-2105-5-113
Katoh K, Standley DM (2016) A simple method to control over-alignment in the MAFFT multiple sequence alignment program. Bioinformatics 32(13):1933–1942. https://doi.org/10.1093/bioinformatics/btw108
Sievers F, Higgins DG (2014) Clustal omega, accurate alignment of very large numbers of sequences. Methods Mol Biol 1079:105–116. https://doi.org/10.1007/978-1-62703-646-7_6
Pei J, Grishin NV (2007) PROMALS: towards accurate multiple sequence alignments of distantly related proteins. Bioinformatics 23(7):802–808. https://doi.org/10.1093/bioinformatics/btm017
Loytynoja A (2014) Phylogeny-aware alignment with PRANK. Methods Mol Biol 1079:155–170. https://doi.org/10.1007/978-1-62703-646-7_10
Abascal F, Zardoya R, Telford MJ (2010) TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations. Nucleic Acids Res 38(Web Server issue):W7–13. https://doi.org/10.1093/nar/gkq291
Rice P, Longden I, Bleasby A (2000) EMBOSS: the European molecular biology open software suite. Trends Genet 16(6):276–277
Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, Heger A, Hetherington K, Holm L, Mistry J, Sonnhammer EL, Tate J, Punta M (2014) Pfam: the protein families database. Nucleic Acids Res 42(Database issue):D222–D230. https://doi.org/10.1093/nar/gkt1223
Johnson AD (2010) An extended IUPAC nomenclature code for polymorphic nucleic acids. Bioinformatics 26(10):1386–1389. https://doi.org/10.1093/bioinformatics/btq098
Coordinators NR (2017) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 45 (D1):D12-D17. doi:https://doi.org/10.1093/nar/gkw1071
Ratnasingham S, Hebert PD (2007) Bold: the barcode of life data system (http://www.barcodinglife.org). Mol Ecol Notes 7(3):355–364. https://doi.org/10.1111/j.1471-8286.2007.01678.x
Pickett BE, Greer DS, Zhang Y, Stewart L, Zhou L, Sun G, Gu Z, Kumar S, Zaremba S, Larsen CN, Jen W, Klem EB, Scheuermann RH (2012) Virus pathogen database and analysis resource (ViPR): a comprehensive bioinformatics database and analysis resource for the coronavirus research community. Virus 4(11):3209–3226. https://doi.org/10.3390/v4113209
Acknowledgment
This work was supported by the Italian nodes of Lifewatch and ELIXIR Research Infrastructures.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Balech, B. et al. (2018). DNA Multiple Sequence Alignment Guided by Protein Domains: The MSA-PAD 2.0 Method. In: Pantaleo, V., Chiumenti, M. (eds) Viral Metagenomics. Methods in Molecular Biology, vol 1746. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7683-6_13
Download citation
DOI: https://doi.org/10.1007/978-1-4939-7683-6_13
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-7682-9
Online ISBN: 978-1-4939-7683-6
eBook Packages: Springer Protocols