Data mining of metagenomes to find novel enzymes: a non-computationally intensive method

Góngora-Castillo, Elsa; López-Ochoa, Luisa A.; Apolinar-Hernández, Max M.; Caamal-Pech, Aldo M.; Contreras-de la Rosa, Perla A.; Quiroz-Moreno, Adriana; Ramírez-Prado, Jorge H.; O’Connor-Sánchez, Aileen

doi:10.1007/s13205-019-2044-6

Data mining of metagenomes to find novel enzymes: a non-computationally intensive method

Protocols and Methods
Published: 30 January 2020

Volume 10, article number 78, (2020)
Cite this article

3 Biotech Aims and scope Submit manuscript

688 Accesses
3 Citations
2 Altmetric
Explore all metrics

Abstract

Currently, there is a need of non-computationally-intensive bioinformatics tools to cope with the increase of large datasets produced by Next Generation Sequencing technologies. We present a simple and robust bioinformatics pipeline to search for novel enzymes in metagenomic sequences. The strategy is based on pattern searching using as reference conserved motifs coded as regular expressions. As a case study, we applied this scheme to search for novel proteases S8A in a publicly available metagenome. Briefly, (1) the metagenome was assembled and translated into amino acids; (2) patterns were matched using regular expressions; (3) retrieved sequences were annotated; and (4) diversity analyses were conducted. Following this pipeline, we were able to identify nine sequences containing an S8 catalytic triad, starting from a metagenome containing 9,921,136 Illumina reads. Identity of these nine sequences was confirmed by BLASTp against databases at NCBI and MEROPS. Identities ranged from 62 to 89% to their respective nearest ortholog, which belonged to phyla Proteobacteria, Actinobacteria, Planctomycetes, Bacterioidetes, and Cyanobacteria, consistent with the most abundant phyla reported for this metagenome. All these results support the idea that they all are novel S8 sequences and strongly suggest that our methodology is robust and suitable to detect novel enzymes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MetaEuk—sensitive, high-throughput gene discovery, and annotation for large-scale eukaryotic metagenomics

Article Open access 03 April 2020

GHOSTX: A Fast Sequence Homology Search Tool for Functional Annotation of Metagenomic Data

UMGAP: the Unipept MetaGenomics Analysis Pipeline

Article Open access 10 June 2022

References

Afgan E, Baker D, Batut B et al (2018) The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res 46:W537–W544. https://doi.org/10.1093/nar/gky379
Article CAS PubMed PubMed Central Google Scholar
Altschul SF, Gish W, Miller W et al (1990) Basic local alignment search tool. J Mol Biol 215:403–410. https://doi.org/10.1016/S0022-2836(05)80360-2
Article CAS PubMed Google Scholar
Amann R, Ludwig W, Schleifer K (1995) Phylogenetic identification and in situ detection of individual microbial cells without cultivation. Microbiol Rev 59:143–169
CAS PubMed PubMed Central Google Scholar
Andrews S (2010) FastQC. In: Qual. Control Tool High Throughput Seq. Data. www.bioinformatics.babraham.ac.uk/projects/fastqc. Accessed 4 Oct 2018
Bailey TL, Johnson J, Grant CE, Noble WS (2015) The MEME Suite. Nucleic Acids Res 43:W39–W49. https://doi.org/10.1093/nar/gkv416
Article CAS PubMed PubMed Central Google Scholar
Chen I-MA, Chu K, Palaniappan K et al (2019) IMG/M vol 5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes. Nucleic Acids Res 47:D666–D677. https://doi.org/10.1093/nar/gky901
Article CAS PubMed Google Scholar
DeLange R, Smith E (1967) Subtilisin Carlsberg. Amino acid composition; isolation and composition of peptides from the tryptic hydrolysate. J Biol Chem 243:2134–2142
Google Scholar
Guindon S, Dufayard J-F, Lefort V et al (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59:307–321. https://doi.org/10.1093/sysbio/syq010
Article CAS PubMed Google Scholar
Jisha VN, Smitha RB, Pradeep S et al (2013) Versatility of microbial proteases. Adv Enzyme Res 01:39–51. https://doi.org/10.4236/aer.2013.13005
Article CAS Google Scholar
Katoh K, Standley DM (2013) MAFFT multiple alignment software version 7: improvements in performance and usability. Mol Biol Evol 30:772–780. https://doi.org/10.1093/molbev/mst010
Article CAS PubMed PubMed Central Google Scholar
Keegan KP, Glass EM, Meyer F (2016) MG-RAST, a metagenomics service for analysis of microbial community structure and function. In: Martin F, Uroz S (eds) Microbial environmental genomics (MEG). Springer, New York, pp 207–233
Chapter Google Scholar
Laskar M, James RE, Chatterjee A et al (2011) Modeling and structural analysis of evolutionarily diverse S8 family serine proteases. Bioinformation 7(5):239–245
Article PubMed PubMed Central Google Scholar
Margulies M, Egholm M, Altman WE et al (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature. https://doi.org/10.1038/nature03959
Article PubMed PubMed Central Google Scholar
Mitchell AL, Scheremetjew M, Denise H et al (2018) EBI Metagenomics in 2017: enriching the analysis of microbial communities, from sequence reads to assemblies. Nucleic Acids Res 46:D726–D735. https://doi.org/10.1093/nar/gkx967
Article CAS PubMed Google Scholar
NCBI Resource Coordinators (2012) Database resources of the national center for biotechnology information. Nucleic Acids Res 41:D8–D20. https://doi.org/10.1093/nar/gks1189
Article CAS PubMed Central Google Scholar
Nielsen HB, Almeida M, Juncker AS et al (2014) Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat Biotechnol 32:822
Article CAS PubMed Google Scholar
Rawlings ND, Barrett AJ, Thomas PD et al (2018) The MEROPS database of proteolytic enzymes, their substrates and inhibitors in 2017 and a comparison with peptidases in the PANTHER database. Nucleic Acids Res 46:D624–D632. https://doi.org/10.1093/nar/gkx1134
Article CAS PubMed Google Scholar
Rice P, Longden I, Bleasby A (2000) EMBOSS: the European molecular biology open software suite. Trends Genet 16:276–277. https://doi.org/10.1016/S0168-9525(00)02024-2
Article CAS PubMed Google Scholar
Shendure J, Balasubramanian S, Church GM et al (2017) DNA sequencing at 40: past, present and future. Nature 550:345–353. https://doi.org/10.1038/nature24286
Article CAS PubMed Google Scholar
Thézé J, Li T, du Plessis L et al (2018) Genomic epidemiology reconstructs the introduction and spread of Zika virus in Central America and Mexico. Cell Host Microbe 23:855-864.e7. https://doi.org/10.1016/j.chom.2018.04.017
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The authors wish to express their gratitude to National Science and Technology Council, Mexico for providing the financial support for this research (Project No. INFR-2016-01-269833). The authors thank César de los Santos-Briones and Mildred R. Carrillo-Pech for their technical assistance.

Author information

Max M. Apolinar-Hernández
Present address: Instituto de Biotecnología, Universidad Autónoma de Nuevo León, San Nicolás de los Garza, Nuevo León, Mexico

Authors and Affiliations

Unidad de Biotecnología, CONACYT, Centro de Investigación Científica de Yucatán, A. C., Mérida, Yucatán, Mexico
Elsa Góngora-Castillo
Unidad de Bioquímica y Biología Molecular, Centro de Investigación Científica de Yucatán, A. C., Mérida, Yucatán, Mexico
Luisa A. López-Ochoa
Unidad de Biotecnología, Centro de Investigación Científica de Yucatán, A. C., Mérida, Yucatán, Mexico
Max M. Apolinar-Hernández, Aldo M. Caamal-Pech, Perla A. Contreras-de la Rosa, Adriana Quiroz-Moreno, Jorge H. Ramírez-Prado & Aileen O’Connor-Sánchez

Authors

Elsa Góngora-Castillo
View author publications
You can also search for this author in PubMed Google Scholar
Luisa A. López-Ochoa
View author publications
You can also search for this author in PubMed Google Scholar
Max M. Apolinar-Hernández
View author publications
You can also search for this author in PubMed Google Scholar
Aldo M. Caamal-Pech
View author publications
You can also search for this author in PubMed Google Scholar
Perla A. Contreras-de la Rosa
View author publications
You can also search for this author in PubMed Google Scholar
Adriana Quiroz-Moreno
View author publications
You can also search for this author in PubMed Google Scholar
Jorge H. Ramírez-Prado
View author publications
You can also search for this author in PubMed Google Scholar
Aileen O’Connor-Sánchez
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All the authors contributed to this work. Góngora-Castillo and Ramirez-Prado designed and performed the experiments and analyzed the data; Caamal-Pech, Contreras-De la Rosa and Apolinar-Hernández participated in performing the experiments and the data analysis. López-Ochoa and Quiroz-Moreno participated in drafting the paper and discussing results. O’Connor-Sanchez, Ramirez-Prado and Góngora-Castillo conceived and designed the research and wrote the paper.

Corresponding authors

Correspondence to Jorge H. Ramírez-Prado or Aileen O’Connor-Sánchez.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical statement

Each of the authors confirms that this manuscript is original, has not been previously published and is not currently under consideration by any other journal. Additionally, all of the authors have approved the contents of this paper and have agreed to the 3 Biotech’s submission policies. The manuscript has two corresponding authors, who are Dr. Jorge H Ramírez-Prado and Dr. Aileen O’Connor-Sánchez.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (TXT 5 kb)

Supplementary material 2 (TXT 7 kb)

Supplementary material 3 (DOCX 15 kb)

Supplementary material 4 (DOCX 14 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Góngora-Castillo, E., López-Ochoa, L.A., Apolinar-Hernández, M.M. et al. Data mining of metagenomes to find novel enzymes: a non-computationally intensive method. 3 Biotech 10, 78 (2020). https://doi.org/10.1007/s13205-019-2044-6

Download citation

Received: 16 August 2019
Accepted: 28 December 2019
Published: 30 January 2020
DOI: https://doi.org/10.1007/s13205-019-2044-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data mining of metagenomes to find novel enzymes: a non-computationally intensive method

Abstract

Access this article

Similar content being viewed by others

MetaEuk—sensitive, high-throughput gene discovery, and annotation for large-scale eukaryotic metagenomics

GHOSTX: A Fast Sequence Homology Search Tool for Functional Annotation of Metagenomic Data

UMGAP: the Unipept MetaGenomics Analysis Pipeline

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Ethical statement

Electronic supplementary material

Supplementary material 1 (TXT 5 kb)

Supplementary material 2 (TXT 7 kb)

Supplementary material 3 (DOCX 15 kb)

Supplementary material 4 (DOCX 14 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Data mining of metagenomes to find novel enzymes: a non-computationally intensive method

Abstract

Access this article

Similar content being viewed by others

MetaEuk—sensitive, high-throughput gene discovery, and annotation for large-scale eukaryotic metagenomics

GHOSTX: A Fast Sequence Homology Search Tool for Functional Annotation of Metagenomic Data

UMGAP: the Unipept MetaGenomics Analysis Pipeline

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Ethical statement

Electronic supplementary material

Supplementary material 1 (TXT 5 kb)

Supplementary material 2 (TXT 7 kb)

Supplementary material 3 (DOCX 15 kb)

Supplementary material 4 (DOCX 14 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation