Abstract
Essential genes of an organism are those genes that are required for the growth to a fertile adult and is pivotal for the survival of an organism. In this study, a new computational approach based on machine learning method is designed which can constructively project essential genes by integration of homologous, gene intrinsic, and network topology features. A set of 15 bacterial organisms as reference species have been used which have characterized essential genes. By applying “Extreme Gradient Boosting (XGBoost)” for Bacillus Subtilis 168, the classification model through tenfold cross-validation test gave average AUC value of 0.9649. Further applying this new model to a closely related organism Salmonella enterica serovar Typhimurium LT2 resulted in a very definitive AUC value of 0.8608. To assess the stability and consistency of the proposed classifier, a different set of target organisms comprised of Escherichia coli MG1655 and Streptococcus sanguinis SK36 and another classifier based on PCR method were implemented. The performance of the model based on principal component regression (PCR) method for both set of target organisms resulted in lower AUC values. It shows that the newly designed feature-integrated approach based on XGBoost method results in better predictive accuracy to identify essential genes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zhang, X., Acencio, M.L., Lemke, N.: Predicting essential genes and proteins based on machine learning and network topological features: a comprehensive review. Front. Physiol. 7, 75 (2016).
Hua, H.-L., Zhang, F.-Z., Labena, A.A., Dong, C., Jin, Y.-T., Guo, F.-B.: An approach for predicting essential genes using multiple homology mapping and machine learning algorithms. Biomed. Res. Int. 2016, 7639397 (2016)
Lu, Y., Deng, J., Carson, M.B., Lu, H., Lu, L.J.: Computational methods for the prediction of microbial essential genes. Curr. Bioinform. 9(2), 89–101 (2014)
Juhas, M., Stark, M., von Mering, C., Lumjiaktase, P., Crook, D.W., Valvano, M.A., Eberl, L.: High confidence prediction of essential genes in Burkholderia cenocepacia. PLoS ONE 7(6), e40064 (2012)
Lin, Y., Zhang, R.R.: Putative essential and core-essential genes in Mycoplasma genomes. Sci. Rep. 1, 53 (2011)
Seringhaus, A., Paccanaro, A., Borneman, M., Snyder, M., Gerstein, M.: Predicting essential genes in fungal genomes. Genome Res. 16(9), 1126–1135 (2006)
Peng, W., Wang, J., Wang, W., Liu, Q., Wu, F.X., Pan, Y.: Iteration method for predicting essential proteins based on orthology and protein-protein interaction networks. BMC Syst. Biol. 6, 87 (2012)
Li, M., Lu, Y., Wang, J., Wu, F.-X., Pan, Y.: A topology potential- based method for identifying essential proteins from PPI networks. IEEE/ACM Trans. Comput. Biol. Bioinform. 12(2), 372–383 (2015)
Zhang, R., Lin, Y.: DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes. Nucleic Acids Res. 37(D1), D455–D458 (2009)
Luo, H., Lin, Y., Gao, F., Zhang, C.T., Zhang, R.: DEG 10, an update of the database of essential genes that includes both protein-coding genes and noncoding genomic elements. Nucleic Acids Res. 42(D1), D574–D580 (2014)
Jordan, I.K., Rogozin, I.B., Wolf, Y.I., Koonin, E.V.: Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res. 12(6), 962–968 (2002)
Luo, H., Gao, F., Lin, Y.: Evolutionary conservation analysis between the essential and nonessential genes in bacterial genomes. Sci. Rep. 5, 13210 (2015)
Wei, W., Ning, L.-W., Ye, Y.-N., Guo, F.-B.: Geptop: a gene prediction tool for sequenced bacterial genomes based on orthology and phylogeny. PLoS ONE 8(8), e72343 (2013)
Knight, R.D., Freeland, S.J., Landweber, L.F.: A simple model based on mutation and selection explains trends in codon and amino-acid usage and GC composition within and across genomes. Genome Biol. 2(4), 1–13 (2001)
Lipman, D.J. et al.: The relationship of protein conservation and sequence length. BMC Evol. Biol. 2.1 (2002)
Peden, J.: CodonW. In: University of Nottingham (1997)
Yu, H., Greenbaum, D., Xin Lu, H., Zhu, X., Gerstein, M.: Genomic analysis of essentiality within protein networks. Trends Genet. 20(6), 227–231 (2004)
Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang, J.T., Ramage, D., Amin, N., Schwikowski, B., Ideker, T.: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13(11), 2498–2504 (2003)
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA (2016)
Zou, Q., Zeng, J., Cao, L., Ji, R.: A novel features ranking metric with application to scalable visual and bioinformatics data classification. Neurocomputing 173, 346–354 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Singhal, A., Roy, D., Mittal, S., Dhar, J., Singh, A. (2019). A New Computational Approach to Identify Essential Genes in Bacterial Organisms Using Machine Learning. In: Verma, N., Ghosh, A. (eds) Computational Intelligence: Theories, Applications and Future Directions - Volume I. Advances in Intelligent Systems and Computing, vol 798. Springer, Singapore. https://doi.org/10.1007/978-981-13-1132-1_6
Download citation
DOI: https://doi.org/10.1007/978-981-13-1132-1_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1131-4
Online ISBN: 978-981-13-1132-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)