Skip to main content

Machine Learning and Deep Learning Methods in Ecotoxicological QSAR Modeling

  • Protocol
  • First Online:
Ecotoxicological QSARs

Part of the book series: Methods in Pharmacology and Toxicology ((MIPT))

Abstract

Today the registered chemical structures are about 28 millions, while experimental toxicity data are available for a few hundred thousands of them. Defining properties and effects for all the available chemicals is a huge task due to the cost of the experimentation and to legislative restrictions. Therefore, prediction is the only available solution, but it poses many challenges in terms of accuracy and interpretability. Predictive toxicology systems use statistics as well as methods based on machine learning (ML). While ML has been widely used in the pharmaceutical domain, its use in ecotoxicology is more limited. After reviewing the experiences in quantitative structure-activity relationships (QSARs) for modeling CMR (carcinogenic, mutagenic, reproductive) toxicity and PBT (persistent, bioaccumulative, and toxic) chemicals, we look at the advancements of technology in ML. Recently, the investigation of the neural basis for many cognitive functions has provided the tools to create new systems that can think, solve problems, find patterns, and recognize images and texts; these new methods are named deep learning (DL). We modified the most successful DL architecture, implemented Toxception as a tool to generate QSAR models, and tested it in a real case, on a dataset of about 20,000 molecules tested for mutagenicity with the Ames test. The results obtained challenge the current state of the art. In addition, Toxception does not use any chemistry knowledge besides the 2D structures derived from SMILES. We conclude examining advantages, open challenges, and drawbacks of building QSARs with DL.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.nvidia.com/en-us/data-center/dgx-1/

  2. 2.

    https://www.vegahub.eu/download/

  3. 3.

    https://toxnet.nlm.nih.gov/cpdb/

  4. 4.

    ToxTree: http://toxtree.sourceforge.net/

  5. 5.

    https://www.epa.gov/chemical-research/toxicity-estimation-software-tool-test

  6. 6.

    http://image-net.org/challenges/LSVRC/

  7. 7.

    https://www.kaggle.com/c/MerckActivity

  8. 8.

    Rdkit. URL https://bit.ly/2OYLjj9

  9. 9.

    Talos. URL https://bit.ly/2yL9gQJ

References

  1. Judson J, Richard A, Dix DJ (2009) The toxicity data landscape for environmental chemicals. Environ Health Perspect 117(5):685–695

    Article  CAS  PubMed  Google Scholar 

  2. Gini G, Ferrari T, Cattaneo D, Golbamaki N, Manganaro A, Benfenati E (2013) Automatic knowledge extraction from chemical structures: the case of mutagenicity prediction. SAR QSAR Environ Res 24(5):365–383. https://doi.org/10.1080/1062936X.2013.773376

    Article  CAS  PubMed  Google Scholar 

  3. Collins FS, Gray GM, Bucher J (2008) Transforming environmental health protection. Science 319(5865):906–907. https://doi.org/10.1126/science.1154619

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Gini G, Katrizky A (eds) (1999) Predictive toxicology of chemicals: experiences and impact of AI tools, papers from the AAAI Spring Symposium on Predictive toxicology SS-99-01. AAAI Press, Menlo Park, 1999

    Google Scholar 

  5. Lo Y-C, Rensi SE, Torng W, Altman RB (2018) Machine learning in chemoinformatics and drug discovery. Drug Discov Today 23(8):1538–1546

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Khan PM, Roy K (2018) Current approaches for choosing feature selection and learning algorithms in quantitative structure-activity relationships (QSAR). Expert Opin Drug Discovery 13(12):1075–1089. https://doi.org/10.1080/17460441.2018.1542428

    Article  CAS  Google Scholar 

  7. Devinyak OT, Lesyk RB (2016) 5-Year trends in QSAR and its machine learning methods. Curr Comput Aided Drug Des, Las Vegas, NV, USA. 12(4):265–271

    Google Scholar 

  8. Zhang L, Tan J, Han D, Zhu H (2017) From machine learning to deep learning: progress in machine intelligence for rational drug discovery. Drug Discov Today 22(1):1680–1685

    Article  PubMed  Google Scholar 

  9. Lee Y, Buchanan BG, Mattison DM, Klopman G, Rosenkranz HS (1995) Learning rules to predict rodent carcinogenicity of non-genotoxic chemicals. Mutat Res 328:127–149

    Article  CAS  PubMed  Google Scholar 

  10. Bradbury SP, Feijtel TCJ, Van Leeuwen CJ (2004) Meeting the scientific needs of ecological risk assessment in a regulatory context. Environ Sci Technol 38(23):463A–470A

    Google Scholar 

  11. Mazzatorta P, Benfenati E, Lorenzini P, Vighi M (2004) QSAR in ecotoxicology: an overview of modern classification techniques. J Chem Inf Comput Sci 44:105–112

    Article  CAS  PubMed  Google Scholar 

  12. Helma C, King RD, Kramer S, Srinivasan A (2001) The predictive toxicology challenge 2000–2001. http://www.informatik.uni-freiburg.de/-rnl/ptc/

  13. Gini G, Benfenati E, Lorenzini M, Bruschi M, Grasso P (1999) Predictive carcinogenicity: a model for aromatic compounds, with nitrogen-containing substituents, based on molecular descriptors using artificial neural networks. J Chem Inf Comput Sci 39:1076–1080. https://doi.org/10.1021/ci9903096

    Article  CAS  PubMed  Google Scholar 

  14. Gini G, Lorenzini M, Benfenati E, Brambilla R, Malve’ L (2001) Mixing a symbolic and a subsymbolic expert to improve carcinogenicity prediction of aromatic compounds. Proceedings of second workshop on Multiple Classifier Systems (MCS 2001), Springer, pp 126–135

    Google Scholar 

  15. Rallo R, Espinosa G, Giralt F (2005) Using an ensemble of neural based QSARs for the prediction of toxicological properties of chemical contaminants. Process Saf Environ Prot 83(B4):387–392

    Article  CAS  Google Scholar 

  16. Fjodorova N, Vračko M, Novič M, Roncaglioni A, Benfenati E (2010) New public QSAR model for carcinogenicity. Chem Cent J 4(Suppl 1):S3. https://doi.org/10.1186/1752-153X-4-S1-S3

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Golbamaki A, Benfenati E, Golbamaki N, Manganaro A, Merdivan E, Gini G (2016) New clues on carcinogenicity-related substructures derived from mining two large datasets of chemical compounds. J Environ Sci Health C 34(2):97–113

    Article  CAS  Google Scholar 

  18. Li N, Qi J, Wang P, Zhang X, Zhang T, Li H (2019, 2019) Quantitative structure–activity relationship (QSAR) study of carcinogenicity of polycyclic aromatic hydrocarbons (PAHs) in atmospheric particulate matter by random forest (RF). Anal Methods. https://doi.org/10.1039/C8AY02720J

  19. Papamokos G, Silins I (2016) Combining QSAR modeling and text-mining techniques to link chemical structures and carcinogenic modes of action. Front Pharmacol. 30 Aug 2016. https://doi.org/10.3389/fphar.2016.00284

  20. Ferrari T, Gini G (2010) An open source multistep model to predict mutagenicity from statistic analysis and relevant structural alerts. Chem Cent J 4(Suppl 1):S2. online http://www.journal.chemistrycentral.com/

    Article  PubMed  PubMed Central  Google Scholar 

  21. Gini G, Franchi AM, Manganaro A, Golbamaki A, Benfenati E (2014) ToxRead: a tool to assist in read across and its use to assess mutagenicity of chemicals, SAR and QSAR in environmental research. https://doi.org/10.1080/1062936X.2014.976267, pp 1–13, online December 2014

  22. Toropov AA, Toropova AP, Martyanov SE, Benfenati E, Gini G, Leszczynska D, Leszczynski J (2011) Comparison of SMILES and molecular graphs as the representation of the molecular structure for QSAR analysis for mutagenic potential of polyaromatic amines. Chemom Intell Lab Syst 109:94–100

    Article  CAS  Google Scholar 

  23. Maunz A, Gütlein M, Rautenberg M, Vorgrimmler D, Gebele D, Helma C (2013) Lazar: a modular predictive toxicology framework. Front Pharmacol 4:38. https://doi.org/10.3389/fphar.2013.00038

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Zhang Q-Y, Aires-de-Sousa J (2007) Random forest prediction of mutagenicity from empirical physicochemical descriptors. J Chem Inf Model 47(1):1–8. https://doi.org/10.1021/ci050520j

    Article  CAS  PubMed  Google Scholar 

  25. Maran U, Sid S (2003) QSAR Modeling of genotoxicity on non-congeneric sets of organic compounds. Artif Intell Rev 20:13–38

    Article  Google Scholar 

  26. Cronin MTD, Worth AP (2008) (Q)SARs for predicting effects relating to reproductive toxicity. QSAR Comb Sci 27(1):91–100

    Article  CAS  Google Scholar 

  27. Cassano A, Manganaro A, Martin T, Young D, Piclin N, Pintore M, Bigoni D, Benfenati E (2010) CAESAR models for developmental toxicity. Chem Cent J 4(Supp 1):S4. http://www.journal.chemistrycentral.com/content/4/S1/S4Cassano

    Article  PubMed  PubMed Central  Google Scholar 

  28. Baker JR, Gamberger D, Mihelcic JR, Sabljic A (2004) Evaluation of artificial intelligence based models for chemical biodegradability prediction. Molecules 9(12):989–1003. https://doi.org/10.3390/91200989

  29. Lombardo A, Pizzo F, Benfenati E, Manganaro A, Ferrari T, Gini G (2016) A new in silico classification model for ready biodegradability, based on molecular fragments. Chemosphere 108(2016):10–16

    Google Scholar 

  30. Miller TH, Gallidabino MD, MacRae JI, Owen SF, Bury NR, Barron LP (2019) Prediction of bioconcentration factors in fish and invertebrates using machine learning. Sci Total Environ 648:80–89

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Lombardo A, Roncaglioni A, Boriani E, Milan C, Benfenati E (2010) Assessment and validation of the CAESAR predictive model for bioconcentration factor (BCF) in fish. Chem Cent J 4(Supp1):S1

    Article  PubMed  PubMed Central  Google Scholar 

  32. Valsecchi C, Grisoni F, Consonni V, Ballabio D (2019) Structural alerts for the identification of bioaccumulative compounds. Integr Environ Assess Manag 15(1):19–28

    Article  CAS  PubMed  Google Scholar 

  33. Benfenati E, Roncaglioni A, Petoumenou MI, Cappelli CI, Gini G (2015) Integrating QSAR and read-across for environmental assessment. SAR QSAR Environ Res 26(7–9):605–618

    Article  CAS  PubMed  Google Scholar 

  34. Benfenati E (ed) (2007) Quantitative structure-activity relationships (QSAR) for pesticide regulatory purposes. Amsterdam Elsevier Science

    Google Scholar 

  35. Gini G, Ferrari T, Lombardo A, Cassano A, Benfenati E (2019) A new QSAR model for acute fish toxicity based on mined structural alerts. J Toxicol Risk Assess 5(1):016. https://doi.org/10.23937/2572-4061.1510016

    Article  CAS  Google Scholar 

  36. Gini G, Craciun M, Benfenati E (2004) Combining unsupervised and supervised artificial neural networks to predict aquatic toxicity. J Chem Inf Comput Sci 44(6):1897–1902

    Article  CAS  PubMed  Google Scholar 

  37. Pintore M, Piclin N, Benfenati E, Gini G, Chretien JR (2003) Predicting toxicity against the fathead Minnow by Adaptive Fuzzy Partition. QSAR Comb Sci (Wiley-VCH) 22:210–219

    Article  CAS  Google Scholar 

  38. Toropova A, Toropov A, Veselinovic A, Veselinović J, Leszczynska D, Leszczynski J (2016) Monte Carlo based QSAR models for toxicity of organic chemicals to Daphnia magna. Environ Toxicol Chem 35(11):2691–2697

    Article  CAS  PubMed  Google Scholar 

  39. Xu Y, Pei J, Lai L (2017) Deep learning based regression and multi-class models for acute oral toxicity prediction with automatic chemical feature extraction. arXiv:1704.04718v3 [stat.ML]

    Google Scholar 

  40. Sayre R, Grulke C (2018) Universal LD50 predictions using deep learning. ICCVAM – Predictive models for acute oral systemic toxicity, Bethesda, 11–12 Apr 2018

    Google Scholar 

  41. Benfenati E, Mazzatorta P, Neagu CD, Gini G (2002) Combining classifiers of pesticides toxicity through a neuro-fuzzy approach. Proceedings of 3rd international workshop on multiple classifier systems, MCS 2002, Springer, Cagliari, June 2002, pp 293–303

    Google Scholar 

  42. Mazzatorta P, Cronin MTD, Benfenati E (2006) A QSAR study of avian oral toxicity using support vector machines and genetic algorithms. Mol Inform 25(7):616–628

    CAS  Google Scholar 

  43. Gini G, Garg T, Stefanelli M (2009) Ensembling regression models to improve their predictivity: a case study in QSAR (Quantitative Structure Activity Relationships) within computational chemometrics. Appl Artif Intell 23:261–281

    Article  Google Scholar 

  44. Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. airXiv:1602.07261v2 [cs.CV]

    Google Scholar 

  45. Goh G, Siegel C, Vishnu A, Hodas NO, Baker N (2017) Chemception: a deep neural network with minimal chemistry knowledge matches the performance of expert-developed QSAR/QSPR models. https://arxiv.org/abs/1706.06689

  46. McCulloch WS, Warren S, Pitts W (1943) A logical calculus of the ideas immanent in nervous activity. B Math Biophy 5(4):115–133. ISSN 1522-9602. https://doi.org/10.1007/BF02478259

    Article  Google Scholar 

  47. Werbos PJ (1994) The roots of backpropagation: from ordered derivatives to neural networks and political forecasting. Wiley, New York

    Google Scholar 

  48. Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Geoffrey G, David D, Miroslav D (eds) Proceedings of the fourteenth international conference on artificial intelligence and statistics, Fort Lauderdale, 11–13 Apr 2011; PMLR Proceedings of Machine Learning Research, pp 315–323

    Google Scholar 

  49. Devillers J (ed) (1996) Neural networks in QSAR and drug design. Academic Press, San Diego

    Google Scholar 

  50. O’Shea KT (2015) An introduction to convolutional neural networks. arXiv:1511.08458v2 [cs.NE]

    Google Scholar 

  51. LeCun Y, Bengio Y (1995) Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks. http://yann.lecun.com/exdb/publis/pdf/lecun-bengio-95a.pdf

  52. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA. arXiv:1511.08458 [cs.NE]

  53. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2016) Going deeper with convolutions. Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA. pp 1–9

    Google Scholar 

  54. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. The IEEE conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA. pp 770–778

    Google Scholar 

  55. Serre T, Wolf L, Bileschi S, Riesenhuber M, Poggio T (2007) Robust object recognition with cortex-like mechanisms. IEEE Trans Pattern Anal Mach Intell 29(3):411–426

    Article  PubMed  Google Scholar 

  56. Lin M, Chen Q, Yan S (2016) Network in network. arXiv preprint arXiv:1312.4400, 2013

    Google Scholar 

  57. Ames BN (1984) The detection of environmental mutagens and potential. Cancer 53:2030–2040

    Article  Google Scholar 

  58. Piegorsch W W, Zeiger E (1991) Measuring intra-assay agreement for the Ames salmonella assay. In: Hotorn L (ed), Statistical methods in toxicology, Lecture Notes in Medical Informatics, Springer, Berlin-Heidelberg, pp 35–41

    Google Scholar 

  59. Benfenati E, Golbamaki A, Raitano G, Roncaglioni A, Manganelli S, Lemke F, Norinder U, Lo Piparo E, Honma M, Manganaro A, Gini G (2018) A large comparison of integrated SAR/QSAR models of the Ames test for mutagenicity. SAR QSAR Environ Res 29(8):591–611

    Article  CAS  PubMed  Google Scholar 

  60. Martin T (2016) User’s guide for T.E.S.T. (Toxicity Estimation Software Tool), U.S. EPA/National Risk Management Research Laboratory/Sustainable Technology Division, Cincinnati, OH (2016). Available at https://www.epa.gov/sites/production/files/2016-05/documents/600r16058.pdf

  61. Benigni R, Netzeva T, Benfenati E, Bossa C (2007) The expanding role of predictive toxicology: an update on the (Q)SAR models for mutagens and carcinogens. J Environ Sci Health C 25(1):53–97. https://doi.org/10.1080/10590500701201828

    Article  CAS  Google Scholar 

  62. Manganaro A, Pizzo F, Lombardo A, Pogliaghi A, Benfenati E (2016) Predicting persistence in the sediment compartment with a new automatic software based on the k-Nearest Neighbor (k-NN) algorithm. Chemosphere 144:1624–1630

    Article  CAS  PubMed  Google Scholar 

  63. Mazzatorta P, Tran LA, Schilter B, Grigorov M (2007) Integration of structure-activity relationship and artificial intelligence systems to improve in silico prediction of Ames test mutagenicity. J Chem Inf Model 47:34–38. https://doi.org/10.1021/ci600411v

    Article  CAS  PubMed  Google Scholar 

  64. Norinder U, Ahlberg E, Carlsson L (2019) Predicting Ames mutagenicity using conformal prediction in the Ames/QSAR International challenge project mutagenesis 34:33–40. https://doi.org/10.1093/mutage/gey038

  65. Weininger M, Weininger A, Weininger JL (1989) Smiles. Algorithm for generation of unique SMILES notation. J Chem Inf Model 29(2):97–101

    Article  CAS  Google Scholar 

  66. Benfenati E, Manganaro A, Gini G (2013) VEGA-QSAR: Ai inside a platform for predictive toxicology, PAI@ AI∗ IA, pp 21–28

    Google Scholar 

  67. NIHS. Ames/QSAR international collaborative study. URL https://bit.ly/2z7Rg2g

  68. Corvi R, Madia F (2018) Eurl ECVAM genotoxicity and carcinogenicity consolidated database of Ames positive chemicals. European Commission, Joint Research Centre (JRC)

    Google Scholar 

  69. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958

    Google Scholar 

  70. Kingma DP, Lei Ba J (2017) Adam: a method for stochastic optimization, arXiv:1412.6980[cs.LG]

    Google Scholar 

  71. Gal Y, Ghahramani Z (2016) Dropout as a Bayesian approximation: representing model uncertainty in deep learning Bayesian in deep learning. arXiv:1506.02142v6 [stat.ML]

    Google Scholar 

  72. Wolpert D (1996) The lack of a priori distinctions between learning algorithms. Neural Comput 8:1341–1390

    Article  Google Scholar 

  73. Ben-David S, Hribes P, Moran S, Shpilka A, Yehudayoff A (2019) Learnability can be undecidable. Nat Mach Intell 1:121

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giuseppina Gini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Gini, G., Zanoli, F. (2020). Machine Learning and Deep Learning Methods in Ecotoxicological QSAR Modeling. In: Roy, K. (eds) Ecotoxicological QSARs. Methods in Pharmacology and Toxicology. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-0150-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-0150-1_6

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-0149-5

  • Online ISBN: 978-1-0716-0150-1

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics