Machine Learning and Deep Learning Methods in Ecotoxicological QSAR Modeling

Gini, Giuseppina; Zanoli, Francesco

doi:10.1007/978-1-0716-0150-1_6

Giuseppina Gini³ &
Francesco Zanoli³

Part of the book series: Methods in Pharmacology and Toxicology ((MIPT))

1394 Accesses
8 Citations

Abstract

Today the registered chemical structures are about 28 millions, while experimental toxicity data are available for a few hundred thousands of them. Defining properties and effects for all the available chemicals is a huge task due to the cost of the experimentation and to legislative restrictions. Therefore, prediction is the only available solution, but it poses many challenges in terms of accuracy and interpretability. Predictive toxicology systems use statistics as well as methods based on machine learning (ML). While ML has been widely used in the pharmaceutical domain, its use in ecotoxicology is more limited. After reviewing the experiences in quantitative structure-activity relationships (QSARs) for modeling CMR (carcinogenic, mutagenic, reproductive) toxicity and PBT (persistent, bioaccumulative, and toxic) chemicals, we look at the advancements of technology in ML. Recently, the investigation of the neural basis for many cognitive functions has provided the tools to create new systems that can think, solve problems, find patterns, and recognize images and texts; these new methods are named deep learning (DL). We modified the most successful DL architecture, implemented Toxception as a tool to generate QSAR models, and tested it in a real case, on a dataset of about 20,000 molecules tested for mutagenicity with the Ames test. The results obtained challenge the current state of the art. In addition, Toxception does not use any chemistry knowledge besides the 2D structures derived from SMILES. We conclude examining advantages, open challenges, and drawbacks of building QSARs with DL.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.nvidia.com/en-us/data-center/dgx-1/
2.
https://www.vegahub.eu/download/
3.
https://toxnet.nlm.nih.gov/cpdb/
4.
ToxTree: http://toxtree.sourceforge.net/
5.
https://www.epa.gov/chemical-research/toxicity-estimation-software-tool-test
6.
http://image-net.org/challenges/LSVRC/
7.
https://www.kaggle.com/c/MerckActivity
8.
Rdkit. URL https://bit.ly/2OYLjj9
9.
Talos. URL https://bit.ly/2yL9gQJ

References

Judson J, Richard A, Dix DJ (2009) The toxicity data landscape for environmental chemicals. Environ Health Perspect 117(5):685–695
Article CAS PubMed Google Scholar
Gini G, Ferrari T, Cattaneo D, Golbamaki N, Manganaro A, Benfenati E (2013) Automatic knowledge extraction from chemical structures: the case of mutagenicity prediction. SAR QSAR Environ Res 24(5):365–383. https://doi.org/10.1080/1062936X.2013.773376
Article CAS PubMed Google Scholar
Collins FS, Gray GM, Bucher J (2008) Transforming environmental health protection. Science 319(5865):906–907. https://doi.org/10.1126/science.1154619
Article CAS PubMed PubMed Central Google Scholar
Gini G, Katrizky A (eds) (1999) Predictive toxicology of chemicals: experiences and impact of AI tools, papers from the AAAI Spring Symposium on Predictive toxicology SS-99-01. AAAI Press, Menlo Park, 1999
Google Scholar
Lo Y-C, Rensi SE, Torng W, Altman RB (2018) Machine learning in chemoinformatics and drug discovery. Drug Discov Today 23(8):1538–1546
Article CAS PubMed PubMed Central Google Scholar
Khan PM, Roy K (2018) Current approaches for choosing feature selection and learning algorithms in quantitative structure-activity relationships (QSAR). Expert Opin Drug Discovery 13(12):1075–1089. https://doi.org/10.1080/17460441.2018.1542428
Article CAS Google Scholar
Devinyak OT, Lesyk RB (2016) 5-Year trends in QSAR and its machine learning methods. Curr Comput Aided Drug Des, Las Vegas, NV, USA. 12(4):265–271
Google Scholar
Zhang L, Tan J, Han D, Zhu H (2017) From machine learning to deep learning: progress in machine intelligence for rational drug discovery. Drug Discov Today 22(1):1680–1685
Article PubMed Google Scholar
Lee Y, Buchanan BG, Mattison DM, Klopman G, Rosenkranz HS (1995) Learning rules to predict rodent carcinogenicity of non-genotoxic chemicals. Mutat Res 328:127–149
Article CAS PubMed Google Scholar
Bradbury SP, Feijtel TCJ, Van Leeuwen CJ (2004) Meeting the scientific needs of ecological risk assessment in a regulatory context. Environ Sci Technol 38(23):463A–470A
Google Scholar
Mazzatorta P, Benfenati E, Lorenzini P, Vighi M (2004) QSAR in ecotoxicology: an overview of modern classification techniques. J Chem Inf Comput Sci 44:105–112
Article CAS PubMed Google Scholar
Helma C, King RD, Kramer S, Srinivasan A (2001) The predictive toxicology challenge 2000–2001. http://www.informatik.uni-freiburg.de/-rnl/ptc/
Gini G, Benfenati E, Lorenzini M, Bruschi M, Grasso P (1999) Predictive carcinogenicity: a model for aromatic compounds, with nitrogen-containing substituents, based on molecular descriptors using artificial neural networks. J Chem Inf Comput Sci 39:1076–1080. https://doi.org/10.1021/ci9903096
Article CAS PubMed Google Scholar
Gini G, Lorenzini M, Benfenati E, Brambilla R, Malve’ L (2001) Mixing a symbolic and a subsymbolic expert to improve carcinogenicity prediction of aromatic compounds. Proceedings of second workshop on Multiple Classifier Systems (MCS 2001), Springer, pp 126–135
Google Scholar
Rallo R, Espinosa G, Giralt F (2005) Using an ensemble of neural based QSARs for the prediction of toxicological properties of chemical contaminants. Process Saf Environ Prot 83(B4):387–392
Article CAS Google Scholar
Fjodorova N, Vračko M, Novič M, Roncaglioni A, Benfenati E (2010) New public QSAR model for carcinogenicity. Chem Cent J 4(Suppl 1):S3. https://doi.org/10.1186/1752-153X-4-S1-S3
Article CAS PubMed PubMed Central Google Scholar
Golbamaki A, Benfenati E, Golbamaki N, Manganaro A, Merdivan E, Gini G (2016) New clues on carcinogenicity-related substructures derived from mining two large datasets of chemical compounds. J Environ Sci Health C 34(2):97–113
Article CAS Google Scholar
Li N, Qi J, Wang P, Zhang X, Zhang T, Li H (2019, 2019) Quantitative structure–activity relationship (QSAR) study of carcinogenicity of polycyclic aromatic hydrocarbons (PAHs) in atmospheric particulate matter by random forest (RF). Anal Methods. https://doi.org/10.1039/C8AY02720J
Papamokos G, Silins I (2016) Combining QSAR modeling and text-mining techniques to link chemical structures and carcinogenic modes of action. Front Pharmacol. 30 Aug 2016. https://doi.org/10.3389/fphar.2016.00284
Ferrari T, Gini G (2010) An open source multistep model to predict mutagenicity from statistic analysis and relevant structural alerts. Chem Cent J 4(Suppl 1):S2. online http://www.journal.chemistrycentral.com/
Article PubMed PubMed Central Google Scholar
Gini G, Franchi AM, Manganaro A, Golbamaki A, Benfenati E (2014) ToxRead: a tool to assist in read across and its use to assess mutagenicity of chemicals, SAR and QSAR in environmental research. https://doi.org/10.1080/1062936X.2014.976267, pp 1–13, online December 2014
Toropov AA, Toropova AP, Martyanov SE, Benfenati E, Gini G, Leszczynska D, Leszczynski J (2011) Comparison of SMILES and molecular graphs as the representation of the molecular structure for QSAR analysis for mutagenic potential of polyaromatic amines. Chemom Intell Lab Syst 109:94–100
Article CAS Google Scholar
Maunz A, Gütlein M, Rautenberg M, Vorgrimmler D, Gebele D, Helma C (2013) Lazar: a modular predictive toxicology framework. Front Pharmacol 4:38. https://doi.org/10.3389/fphar.2013.00038
Article CAS PubMed PubMed Central Google Scholar
Zhang Q-Y, Aires-de-Sousa J (2007) Random forest prediction of mutagenicity from empirical physicochemical descriptors. J Chem Inf Model 47(1):1–8. https://doi.org/10.1021/ci050520j
Article CAS PubMed Google Scholar
Maran U, Sid S (2003) QSAR Modeling of genotoxicity on non-congeneric sets of organic compounds. Artif Intell Rev 20:13–38
Article Google Scholar
Cronin MTD, Worth AP (2008) (Q)SARs for predicting effects relating to reproductive toxicity. QSAR Comb Sci 27(1):91–100
Article CAS Google Scholar
Cassano A, Manganaro A, Martin T, Young D, Piclin N, Pintore M, Bigoni D, Benfenati E (2010) CAESAR models for developmental toxicity. Chem Cent J 4(Supp 1):S4. http://www.journal.chemistrycentral.com/content/4/S1/S4Cassano
Article PubMed PubMed Central Google Scholar
Baker JR, Gamberger D, Mihelcic JR, Sabljic A (2004) Evaluation of artificial intelligence based models for chemical biodegradability prediction. Molecules 9(12):989–1003. https://doi.org/10.3390/91200989
Lombardo A, Pizzo F, Benfenati E, Manganaro A, Ferrari T, Gini G (2016) A new in silico classification model for ready biodegradability, based on molecular fragments. Chemosphere 108(2016):10–16
Google Scholar
Miller TH, Gallidabino MD, MacRae JI, Owen SF, Bury NR, Barron LP (2019) Prediction of bioconcentration factors in fish and invertebrates using machine learning. Sci Total Environ 648:80–89
Article CAS PubMed PubMed Central Google Scholar
Lombardo A, Roncaglioni A, Boriani E, Milan C, Benfenati E (2010) Assessment and validation of the CAESAR predictive model for bioconcentration factor (BCF) in fish. Chem Cent J 4(Supp1):S1
Article PubMed PubMed Central Google Scholar
Valsecchi C, Grisoni F, Consonni V, Ballabio D (2019) Structural alerts for the identification of bioaccumulative compounds. Integr Environ Assess Manag 15(1):19–28
Article CAS PubMed Google Scholar
Benfenati E, Roncaglioni A, Petoumenou MI, Cappelli CI, Gini G (2015) Integrating QSAR and read-across for environmental assessment. SAR QSAR Environ Res 26(7–9):605–618
Article CAS PubMed Google Scholar
Benfenati E (ed) (2007) Quantitative structure-activity relationships (QSAR) for pesticide regulatory purposes. Amsterdam Elsevier Science
Google Scholar
Gini G, Ferrari T, Lombardo A, Cassano A, Benfenati E (2019) A new QSAR model for acute fish toxicity based on mined structural alerts. J Toxicol Risk Assess 5(1):016. https://doi.org/10.23937/2572-4061.1510016
Article CAS Google Scholar
Gini G, Craciun M, Benfenati E (2004) Combining unsupervised and supervised artificial neural networks to predict aquatic toxicity. J Chem Inf Comput Sci 44(6):1897–1902
Article CAS PubMed Google Scholar
Pintore M, Piclin N, Benfenati E, Gini G, Chretien JR (2003) Predicting toxicity against the fathead Minnow by Adaptive Fuzzy Partition. QSAR Comb Sci (Wiley-VCH) 22:210–219
Article CAS Google Scholar
Toropova A, Toropov A, Veselinovic A, Veselinović J, Leszczynska D, Leszczynski J (2016) Monte Carlo based QSAR models for toxicity of organic chemicals to Daphnia magna. Environ Toxicol Chem 35(11):2691–2697
Article CAS PubMed Google Scholar
Xu Y, Pei J, Lai L (2017) Deep learning based regression and multi-class models for acute oral toxicity prediction with automatic chemical feature extraction. arXiv:1704.04718v3 [stat.ML]
Google Scholar
Sayre R, Grulke C (2018) Universal LD50 predictions using deep learning. ICCVAM – Predictive models for acute oral systemic toxicity, Bethesda, 11–12 Apr 2018
Google Scholar
Benfenati E, Mazzatorta P, Neagu CD, Gini G (2002) Combining classifiers of pesticides toxicity through a neuro-fuzzy approach. Proceedings of 3rd international workshop on multiple classifier systems, MCS 2002, Springer, Cagliari, June 2002, pp 293–303
Google Scholar
Mazzatorta P, Cronin MTD, Benfenati E (2006) A QSAR study of avian oral toxicity using support vector machines and genetic algorithms. Mol Inform 25(7):616–628
CAS Google Scholar
Gini G, Garg T, Stefanelli M (2009) Ensembling regression models to improve their predictivity: a case study in QSAR (Quantitative Structure Activity Relationships) within computational chemometrics. Appl Artif Intell 23:261–281
Article Google Scholar
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. airXiv:1602.07261v2 [cs.CV]
Google Scholar
Goh G, Siegel C, Vishnu A, Hodas NO, Baker N (2017) Chemception: a deep neural network with minimal chemistry knowledge matches the performance of expert-developed QSAR/QSPR models. https://arxiv.org/abs/1706.06689
McCulloch WS, Warren S, Pitts W (1943) A logical calculus of the ideas immanent in nervous activity. B Math Biophy 5(4):115–133. ISSN 1522-9602. https://doi.org/10.1007/BF02478259
Article Google Scholar
Werbos PJ (1994) The roots of backpropagation: from ordered derivatives to neural networks and political forecasting. Wiley, New York
Google Scholar
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Geoffrey G, David D, Miroslav D (eds) Proceedings of the fourteenth international conference on artificial intelligence and statistics, Fort Lauderdale, 11–13 Apr 2011; PMLR Proceedings of Machine Learning Research, pp 315–323
Google Scholar
Devillers J (ed) (1996) Neural networks in QSAR and drug design. Academic Press, San Diego
Google Scholar
O’Shea KT (2015) An introduction to convolutional neural networks. arXiv:1511.08458v2 [cs.NE]
Google Scholar
LeCun Y, Bengio Y (1995) Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks. http://yann.lecun.com/exdb/publis/pdf/lecun-bengio-95a.pdf
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA. arXiv:1511.08458 [cs.NE]
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2016) Going deeper with convolutions. Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA. pp 1–9
Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. The IEEE conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA. pp 770–778
Google Scholar
Serre T, Wolf L, Bileschi S, Riesenhuber M, Poggio T (2007) Robust object recognition with cortex-like mechanisms. IEEE Trans Pattern Anal Mach Intell 29(3):411–426
Article PubMed Google Scholar
Lin M, Chen Q, Yan S (2016) Network in network. arXiv preprint arXiv:1312.4400, 2013
Google Scholar
Ames BN (1984) The detection of environmental mutagens and potential. Cancer 53:2030–2040
Article Google Scholar
Piegorsch W W, Zeiger E (1991) Measuring intra-assay agreement for the Ames salmonella assay. In: Hotorn L (ed), Statistical methods in toxicology, Lecture Notes in Medical Informatics, Springer, Berlin-Heidelberg, pp 35–41
Google Scholar
Benfenati E, Golbamaki A, Raitano G, Roncaglioni A, Manganelli S, Lemke F, Norinder U, Lo Piparo E, Honma M, Manganaro A, Gini G (2018) A large comparison of integrated SAR/QSAR models of the Ames test for mutagenicity. SAR QSAR Environ Res 29(8):591–611
Article CAS PubMed Google Scholar
Martin T (2016) User’s guide for T.E.S.T. (Toxicity Estimation Software Tool), U.S. EPA/National Risk Management Research Laboratory/Sustainable Technology Division, Cincinnati, OH (2016). Available at https://www.epa.gov/sites/production/files/2016-05/documents/600r16058.pdf
Benigni R, Netzeva T, Benfenati E, Bossa C (2007) The expanding role of predictive toxicology: an update on the (Q)SAR models for mutagens and carcinogens. J Environ Sci Health C 25(1):53–97. https://doi.org/10.1080/10590500701201828
Article CAS Google Scholar
Manganaro A, Pizzo F, Lombardo A, Pogliaghi A, Benfenati E (2016) Predicting persistence in the sediment compartment with a new automatic software based on the k-Nearest Neighbor (k-NN) algorithm. Chemosphere 144:1624–1630
Article CAS PubMed Google Scholar
Mazzatorta P, Tran LA, Schilter B, Grigorov M (2007) Integration of structure-activity relationship and artificial intelligence systems to improve in silico prediction of Ames test mutagenicity. J Chem Inf Model 47:34–38. https://doi.org/10.1021/ci600411v
Article CAS PubMed Google Scholar
Norinder U, Ahlberg E, Carlsson L (2019) Predicting Ames mutagenicity using conformal prediction in the Ames/QSAR International challenge project mutagenesis 34:33–40. https://doi.org/10.1093/mutage/gey038
Weininger M, Weininger A, Weininger JL (1989) Smiles. Algorithm for generation of unique SMILES notation. J Chem Inf Model 29(2):97–101
Article CAS Google Scholar
Benfenati E, Manganaro A, Gini G (2013) VEGA-QSAR: Ai inside a platform for predictive toxicology, PAI@ AI∗ IA, pp 21–28
Google Scholar
NIHS. Ames/QSAR international collaborative study. URL https://bit.ly/2z7Rg2g
Corvi R, Madia F (2018) Eurl ECVAM genotoxicity and carcinogenicity consolidated database of Ames positive chemicals. European Commission, Joint Research Centre (JRC)
Google Scholar
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958
Google Scholar
Kingma DP, Lei Ba J (2017) Adam: a method for stochastic optimization, arXiv:1412.6980[cs.LG]
Google Scholar
Gal Y, Ghahramani Z (2016) Dropout as a Bayesian approximation: representing model uncertainty in deep learning Bayesian in deep learning. arXiv:1506.02142v6 [stat.ML]
Google Scholar
Wolpert D (1996) The lack of a priori distinctions between learning algorithms. Neural Comput 8:1341–1390
Article Google Scholar
Ben-David S, Hribes P, Moran S, Shpilka A, Yehudayoff A (2019) Learnability can be undecidable. Nat Mach Intell 1:121
Article Google Scholar

Download references

Author information

Authors and Affiliations

DEIB, Politecnico di Milano, Milan, MI, Italy
Giuseppina Gini & Francesco Zanoli

Authors

Giuseppina Gini
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Zanoli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giuseppina Gini .

Editor information

Editors and Affiliations

Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, India
Kunal Roy

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Gini, G., Zanoli, F. (2020). Machine Learning and Deep Learning Methods in Ecotoxicological QSAR Modeling. In: Roy, K. (eds) Ecotoxicological QSARs. Methods in Pharmacology and Toxicology. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-0150-1_6

Download citation

DOI: https://doi.org/10.1007/978-1-0716-0150-1_6
Published: 17 January 2020
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-0149-5
Online ISBN: 978-1-0716-0150-1
eBook Packages: Springer Protocols

Publish with us

Policies and ethics