Adversarial Robustness on In- and Out-Distribution Improves Explainability

Augustin, Maximilian; Meinke, Alexander; Hein, Matthias

doi:10.1007/978-3-030-58574-7_14

Maximilian Augustin¹²,
Alexander Meinke¹² &
Matthias Hein¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12371))

Included in the following conference series:

European Conference on Computer Vision

3295 Accesses
17 Citations

Abstract

Neural networks have led to major improvements in image classification but suffer from being non-robust to adversarial changes, unreliable uncertainty estimates on out-distribution samples and their inscrutable black-box decisions. In this work we propose RATIO, a training procedure for Robustness via Adversarial Training on In- and Out-distribution, which leads to robust models with reliable and robust confidence estimates on the out-distribution. RATIO has similar generative properties to adversarial training so that visual counterfactuals produce class specific features. While adversarial training comes at the price of lower clean accuracy, RATIO achieves state-of-the-art \(l_2\)-adversarial robustness on CIFAR10 and maintains better clean accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alayrac, J.B., Uesato, J., Huang, P.S., Fawzi, A., Stanforth, R., Kohli, P.: Are labels required for improving adversarial robustness? In: NeurIPS (2019)
Google Scholar
Andriushchenko, M., Croce, F., Flammarion, N., Hein, M.: Square attack: a query-efficient black-box adversarial attack via random search. In: ECCV (2020)
Google Scholar
Athalye, A., Carlini, N., Wagner, D.A.: Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In: ICML (2018)
Google Scholar
Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.R., Samek, W.: On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 10(7), e0130140 (2015)
Article Google Scholar
Baehrens, D., Schroeter, T., Harmeling, S., Kawanabe, M., Hansen, K., Müller, K.R.: How to explain individual classification decisions. J. Mach. Learn. Res. (JMLR) 11, 1803–1831 (2010)
MathSciNet MATH Google Scholar
Barocas, S., Selbst, A.D., Raghavan, M.: The hidden assumptions behind counterfactual explanations and principal reasons. In: FAT (2020)
Google Scholar
Bitterwolf, J., Meinke, A., Hein, M.: Provable worst case guarantees for the detection of out-of-distribution data. arXiv:2007.08473 (2020)
Carlini, N., Wagner, D.: Adversarial examples are not easily detected: bypassing ten detection methods. In: ACM Workshop on Artificial Intelligence and Security (2017)
Google Scholar
Carmon, Y., Raghunathan, A., Schmidt, L., Duchi, J.C., Liang, P.S.: Unlabeled data improves adversarial robustness. In: NeurIPS (2019)
Google Scholar
Chang, C.H., Creager, E., Goldenberg, A., Duvenaud, D.: Explaining image classifiers by counterfactual generation. In: ICLR (2019)
Google Scholar
Cohen, J.M., Rosenfeld, E., Kolter, J.Z.: Certified adversarial robustness via randomized smoothing. In: NeurIPS (2019)
Google Scholar
Croce, F., Andriushchenko, M., Hein, M.: Provable robustness of RELU networks via maximization of linear regions. In: AISTATS (2019)
Google Scholar
Croce, F., Hein, M.: Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In: ICML (2020)
Google Scholar
Croce, F., Hein, M.: Minimally distorted adversarial examples with a fast adaptive boundary attack. In: ICML (2020)
Google Scholar
Dong, Y., Su, H., Zhu, J., Bao, F.: Towards interpretable deep neural networks by leveraging adversarial examples (2017). arXiv preprint, arXiv:1708.05493
Engstrom, L., Ilyas, A., Santurkar, S., Tsipras, D.: Robustness (python library) (2019). https://github.com/MadryLab/robustness
Gowal, S., et al.: On the effectiveness of interval bound propagation for training verifiably robust models (2018), preprint. arXiv:1810.12715v3
Goyal, Y., Wu, Z., Ernst, J., Batra, D., Parikh, D., Lee, S.: Counterfactual visual explanations. In: ICML (2019)
Google Scholar
Grathwohl, W., Wang, K.C., Jacobsen, J.H., Duvenaud, D., Norouzi, M., Swersky, K.: Your classifier is secretly an energy based model and you should treat it like one. In: ICLR (2020)
Google Scholar
Guo, C., Pleiss, G., Sun, Y., Weinberger, K.: On calibration of modern neural networks. In: ICML (2017)
Google Scholar
Hein, M., Andriushchenko, M., Bitterwolf, J.: Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem. In: CVPR (2019)
Google Scholar
Hendricks, L.A., Akata, Z., Rohrbach, M., Donahue, J., Schiele, B., Darrell, T.: Generating visual explanations. In: ECCV (2016)
Google Scholar
Hendricks, L.A., Hu, R., Darrell, T., Akata, Z.: Grounding visual explanations. In: ECCV (2018)
Google Scholar
Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of-distribution examples in neural networks. In: ICLR (2017)
Google Scholar
Hendrycks, D., Mazeika, M., Dietterich, T.: Deep anomaly detection with outlier exposure. In: ICLR (2019)
Google Scholar
Hendrycks, D., Lee, K., Mazeika, M.: Using pre-training can improve model robustness and uncertainty. In: ICML, pp. 2712–2721 (2019)
Google Scholar
Katz, G., Barrett, C., Dill, D., Julian, K., Kochenderfer, M.: Reluplex: an efficient SMT solver for verifying deep neural networks. In: CAV (2017)
Google Scholar
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015). https://doi.org/10.1038/nature14539
Article Google Scholar
Lecuyer, M., Atlidakis, V., Geambasu, R., Hsu, D., Jana, S.: Certified robustness to adversarial examples with differential privacy. In: IEEE Symposium on Security and Privacy (SP) (2019)
Google Scholar
Lee, K., Lee, H., Lee, K., Shin, J.: Training confidence-calibrated classifiers for detecting out-of-distribution samples. In: ICLR (2018)
Google Scholar
Leibig, C., Allken, V., Ayhan, M.S., Berens, P., Wahl, S.: Leveraging uncertainty information from deep neural networks for disease detection. Sci. Rep. 7, 1–14 (2017)
Article Google Scholar
Li, B., Chen, C., Wang, W., Carin, L.: Certified adversarial robustness with additive noise. In: NeurIPS (2019)
Google Scholar
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Valdu, A.: Towards deep learning models resistant to adversarial attacks. In: ICLR (2018)
Google Scholar
Meinke, A., Hein, M.: Towards neural networks that provably know when they don’t know. In: ICLR (2020)
Google Scholar
Miller, T.: Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267, 1–38 (2019)
Article MathSciNet Google Scholar
Mirman, M., Gehr, T., Vechev, M.: Differentiable abstract interpretation for provably robust neural networks. In: ICML (2018)
Google Scholar
Mosbach, M., Andriushchenko, M., Trost, T., Hein, M., Klakow, D.: Logit pairing methods can fool gradient-based attacks. In: NeurIPS 2018 Workshop on Security in Machine Learning (2018)
Google Scholar
Najafi, A., Maeda, S.I., Koyama, M., Miyato, T.: Robustness to adversarial perturbations in learning from incomplete data. In: NeurIPS (2019)
Google Scholar
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: NeurIPS Workshop on Deep Learning and Unsupervised Feature Learning (2011)
Google Scholar
Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In: CVPR (2015)
Google Scholar
Parafita, Á., Vitrià, J.: Explaining visual models by causal attribution. In: ICCV Workshop on XCAI (2019)
Google Scholar
Rice, L., Wong, E., Kolter, J.Z.: Overfitting in adversarially robust deep learning. In: ICML (2020)
Google Scholar
Rony, J., Hafemann, L.G., Oliveira, L.S., Ayed, I.B., Sabourin, R., Granger, E.: Decoupling direction and norm for efficient gradient-based L2 adversarial attacks and defenses. In: CVPR (2019)
Google Scholar
Samangouei, P., Saeedi, A., Nakagawa, L., Silberman, N.: Explaingan: model explanation via decision boundary crossing transformations. In: ECCV (2018)
Google Scholar
Santurkar, S., Tsipras, D., Tran, B., Ilyas, A., Engstrom, L., Madry, A.: Computer vision with a single (robust) classifier. In: NeurIPS (2019)
Google Scholar
Schmidt, L., Santurkar, S., Tsipras, D., Talwar, K., Madry, A.: Adversarially robust generalization requires more data. In: NeurIPS (2018)
Google Scholar
Schott, L., Rauber, J., Bethge, M., Brendel, W.: Towards the first adversarially robust neural network model on mnist. In: ICLR (2019)
Google Scholar
Sehwag, V., et al.: Better the devil you know: An analysis of evasion attacks using out-of-distribution adversarial examples. preprint, arXiv:1905.01726 (2019)
Stutz, D., Hein, M., Schiele, B.: Disentangling adversarial robustness and generalization. In: CVPR (2019)
Google Scholar
Szegedy, C., et al.: Intriguing properties of neural networks. In: ICLR, pp. 2503–2511 (2014)
Google Scholar
Torralba, A., Fergus, R., Freeman, W.T.: 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1958–1970 (2008)
Article Google Scholar
Tramèr, F., Boneh, D.: Adversarial training and robustness for multiple perturbations. In: NeurIPS (2019)
Google Scholar
Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., Madry, A.: Robustness may be at odds with accuracy. In: ICLR (2019)
Google Scholar
Uesato, J., Alayrac, J.B., Huang, P.S., Stanforth, R., Fawzi, A., Kohli, P.: Are labels required for improving adversarial robustness? In: NeurIPS (2019)
Google Scholar
Wachter, S., Mittelstadt, B., Russell, C.: Counterfactual explanations without opening the black box: automated decisions and the GDPR. Harvard J. Law Technol. 31(2), 841–887 (2018)
Google Scholar
Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional gans. In: CVPR (2018)
Google Scholar
Wong, E., Schmidt, F., Metzen, J.H., Kolter, J.Z.: Scaling provable adversarial defenses. In: NeurIPS (2018)
Google Scholar
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: ECCV (2014)
Google Scholar
Zhang, H., Yu, Y., Jiao, J., Xing, E.P., Ghaoui, L.E., Jordan, M.I.: Theoretically principled trade-off between robustness and accuracy. In: ICML (2019)
Google Scholar
Zhu, J.Y., Krähenbühl, P., Shechtman, E., Efros, A.A.: Generative visual manipulation on the natural image manifold. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV (2016)
Google Scholar

Download references

Acknowledgements

M.H and A.M. acknowledge support by the BMBF Tübingen AI Center (FKZ: 01IS18039A) and by DFG TRR 248, project number 389792660 and the DFG Excellence Cluster “Machine Learning -New Perspectives for Science”, EXC 2064/1, project number 390727645. A.M. thanks the IMPRS for Intelligent Systems.

Author information

Authors and Affiliations

University of Tübingen, Tübingen, Germany
Maximilian Augustin, Alexander Meinke & Matthias Hein

Authors

Maximilian Augustin
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Meinke
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Hein
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maximilian Augustin .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 36901 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Augustin, M., Meinke, A., Hein, M. (2020). Adversarial Robustness on In- and Out-Distribution Improves Explainability. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12371. Springer, Cham. https://doi.org/10.1007/978-3-030-58574-7_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-58574-7_14
Published: 13 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58573-0
Online ISBN: 978-3-030-58574-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics