Abstract
Toxicology studies are subject to several concerns, and they raise the importance of an early detection of the potential for toxicity of chemical compounds which is currently evaluated through in vitro assays assessing their bioactivity, or using costly and ethically questionable in vivo tests on animals. Thus we investigate the prediction of the bioactivity of chemical compounds from their physico-chemical structure, and propose that it be automated using machine learning (ML) techniques based on data from in vitro assessment of several hundred chemical compounds. We provide the results of tests with this approach using several ML techniques, using both a restricted dataset and a larger one. Since the available empirical data is unbalanced, we also use data augmentation techniques to improve the classification accuracy, and present the resulting improvements.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Akbani, R., Kwek, S., Japkowicz, N.: Applying support vector machines to imbalanced datasets. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 39–50. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30115-8_7
Breiman, L.: Random Forests. Mach. Learn. 45, 5–32 (2001)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM (2016)
Chollet, F., et al.: Keras (2015). https://github.com/fchollet/keras
Cramer, C.E., Gelenbe, E.: Video quality and traffic QoS in learning-based subsampled and receiver-interpolated video sequences. IEEE J. Sel. Areas Commun. 18(2), 150–167 (2000)
Dix, D.J., Houck, K.A., Martin, M.T., Richard, A.M., Setzer, R.W., Kavlock, R.J.: The ToxCast program for prioritizing toxicity testing of environmental chemicals. Toxicol. Sci. 95(1), 5–12 (2007)
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001). https://doi.org/10.1214/aos/1013203451
Gelenbe, E.: Learning in the recurrent random neural network. Neural Comput. 5(1), 154–164 (1993)
Gelenbe, E., Mao, Z.H., Li, Y.D.: Function approximation with spiked random networks. IEEE Trans. Neural Netw. 10(1), 3–9 (1999)
Gelenbe, E.: Réseaux neuronaux aléatoires stables. Comptes rendus de l’Académie des Sciences. Série 2, Mécanique, Physique, Chimie, Sciences de l’Univers, Sciences de la Terre 310(3), 177–180 (1990)
Gelenbe, E.: A class of genetic algorithms with analytical solution. Rob. Auton. Syst. 22, 59–64 (1997)
Gelenbe, E.: Learning in genetic algorithms. In: Sipper, M., Mange, D., Pérez-Uribe, A. (eds.) ICES 1998. LNCS, vol. 1478, pp. 268–279. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0057628
Gelenbe, E., Yin, Y.: Deep learning with dense random neural networks. In: Gruca, A., Czachórski, T., Harezlak, K., Kozielski, S., Piotrowska, A. (eds.) ICMMI 2017. AISC, vol. 659, pp. 3–18. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-67792-7_1
Goh, G.B., Hodas, N.O., Vishnu, A.: Deep learning for computational chemistry. J. Comput. Chem. 38(16), 1291–1307 (2017)
He, H., Garcia, E.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Hansch, C.: Quantitative structure-activity relationships and the unnamed science. Acc. Chem. Res. 26(4), 147–153 (1993)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Lemaître, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18(17), 1–5 (2017)
Martin, M.T., Judson, R.S., Reif, D.M., Kavlock, R.J., Dix, D.J.: Profiling chemicals based on chronic toxicity results from the U.S. EPA ToxRef database. Environ. Health Perspect. 117(3), 392–399 (2009)
Rogers, D., Hahn, M.: Extended-connectivity fingerprints. J. Chem. Inf. Model. 50(5), 742–754 (2010)
Schultz, T.W., Hewitt, M., Netzeva, T.I., Cronin, M.T.D.: Assessing applicability domains of toxicological QSARs: definition, confidence in predicted values, and the role of mechanisms of action. QSAR Comb. Sci. 26(2), 238–254 (2007)
Sipes, N.S., et al.: Predictive models of prenatal developmental toxicity from ToxCast high-throughput screening data. Toxicol. Sci. 124(1), 109–127 (2011)
Thomas, R.S., et al.: A comprehensive statistical analysis of predicting in vivo hazard using high-throughput in vitro screening. Toxicol. Sci. 128(2), 398–417 (2012)
Yin, Y., Gelenbe, E.: Single-cell based random neural network for deep learning. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 86–93 (2017)
Yin, Y., Wang, L., Gelenbe, E.: Multi-layer neural networks for quality of service oriented server-state classification in cloud servers. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 1623–1627 (2017)
Zang, Q., Rotroff, D.M., Judson, R.S.: Binary classification of a large collection of environmental chemicals from estrogen receptor assays by quantitative structure-activity relationship and machine learning methods. J. Chem. Inf. Model. 53(12), 3244–3261 (2013)
Zhang, Y., Yin, Y., Guo, D., Yu, X., Xiao, L.: Cross-validation based weights and structure determination of chebyshev-polynomial neural networks for pattern classification. Pattern Recogn. 47(10), 3414–3428 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Grenet, I., Yin, Y., Comet, JP., Gelenbe, E. (2018). Machine Learning to Predict Toxicity of Compounds. In: Kůrková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds) Artificial Neural Networks and Machine Learning – ICANN 2018. ICANN 2018. Lecture Notes in Computer Science(), vol 11139. Springer, Cham. https://doi.org/10.1007/978-3-030-01418-6_33
Download citation
DOI: https://doi.org/10.1007/978-3-030-01418-6_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01417-9
Online ISBN: 978-3-030-01418-6
eBook Packages: Computer ScienceComputer Science (R0)