Adversarial Dataset Augmentation Using Reinforcement Learning and 3D Modeling

Kniaz, Vladimir V.; Knyaz, Vladimir A.; Mizginov, Vladimir; Papazyan, Ares; Fomin, Nikita; Grodzitsky, Lev

doi:10.1007/978-3-030-60577-3_38

Vladimir V. Kniaz^6,7,
Vladimir A. Knyaz^6,7,
Vladimir Mizginov⁶,
Ares Papazyan⁶,
Nikita Fomin⁶ &
…
Lev Grodzitsky⁶

Part of the book series: Studies in Computational Intelligence ((SCI,volume 925))

Included in the following conference series:

International Conference on Neuroinformatics

643 Accesses
5 Citations

Abstract

An extensive and diverse dataset is a crucial requirement for the successful training of a deep neural network. Compared to on-site data collection, 3D modeling allows to generate large datasets faster and cheaper. Still, the diversity and perceptual realism of synthetic images remain in the realm of a 3D artist’s experience. Moreover, hard sample mining with 3D modeling poses an open question: which synthetic images are challenging for an object detection model? We present an Adversarial 3D modeling framework for training an object detection model against a reinforcement learning-based adversarial controller. The controller alters the 3D simulator parameters to generate complex synthetic images. The controller aims to minimize the score of the object detection model during the training time. We hypothesize that such an objective of the controller allows to maximize the score of the detection model during inference on real-world data. We evaluate our approach by training a YOLOv3 object detection model using our adversarial framework. A comparison with a similar model trained on random synthetic and real images proves that our framework allows us to achieve better performance than using random real or synthetic data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Dosovitskiy, A., Ros, G., Codevilla, F., López, A., Koltun, V.: CARLA: an open urban driving simulator. CoRR abs/1711.03938 (2017). http://arxiv.org/abs/1711.03938
Gaidon, A., Wang, Q., Cabon, Y., Vig, E.: Virtual worlds as proxy for multi-object tracking analysis. CoRR abs/1605.06457 (2016). http://arxiv.org/abs/1605.06457
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A.C., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, Montreal, Quebec, Canada, 8–13 December 2014, pp. 2672–2680 (2014). http://papers.nips.cc/paper/5423-generative-adversarial-nets
Ha, D., Schmidhuber, J.: Recurrent world models facilitate policy evolution. CoRR abs/1809.01999 (2018). http://arxiv.org/abs/1809.01999
Hausknecht, M.J., Stone, P.: Deep recurrent q-learning for partially observable MDPs. CoRR abs/1507.06527 (2015). http://arxiv.org/abs/1507.06527
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5967–5976. IEEE (2017)
Google Scholar
Jocher, G., guigarfr, perry0418, Ttayu, Veitch-Michaelis, J., Bianconi, G., Baltaci, F., Suess, D., WannaSeaU, IlyaOvodov: ultralitycs/yolov3: Rectangular Inference, Conv2d + Batchnorm2d Layer Fusion (2019). https://doi.org/10.5281/zenodo.2672652
Kalashnikov, D., Irpan, A., Pastor, P., Ibarz, J., Herzog, A., Jang, E., Quillen, D., Holly, E., Kalakrishnan, M., Vanhoucke, V., Levine, S.: QT-Opt: scalable deep reinforcement learning for vision-based robotic manipulation. CoRR abs/1806.10293 (2018). http://arxiv.org/abs/1806.10293
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. CoRR abs/1812.04948 (2018). http://arxiv.org/abs/1812.04948
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN (2019)
Google Scholar
Kniaz, V.V., Knyaz, V., Remondino, F.: The point where reality meets fantasy: mixed adversarial generators for image splice detection. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32, pp. 215–226. Curran Associates, Inc. (2019). http://papers.nips.cc/paper/8315-the-point-where-reality-meets-fantasy-mixed-adversarial-generators.for-image-splice-detection.pdf
Kniaz, V.V., Knyaz, V.A., Hladůvka, J., Kropatsch, W.G., Mizginov, V.: Thermalgan: Multimodal color-to-thermal image translation for person re-identification in multispectral dataset. In: Leal-Taixé, L., Roth, S. (eds.) Computer Vision – ECCV 2018 Workshops, pp. 606–624. Springer, Cham (2019). https://link.springer.com/chapter/10.1007/978-3-030-11024-6_46
Kniaz, V.V., Moshkantsev, P.V., Mizginov, V.A.: Deep learning a single photo voxel model prediction from real and synthetic images. In: Kryzhanovsky, B., Dunin-Barkowski, W., Redko, V., Tiumentsev, Y. (eds.) Advances in Neural Computation, Machine Learning, and Cognitive Research III, pp. 3–16. Springer International Publishing, Cham (2020)
Google Scholar
Knyaz, V.: Multimodal data fusion for object recognition. In: Stella, E. (ed.) Multimodal Sensing: Technologies and Applications, vol. 11059, pp. 198–209. International Society for Optics and Photonics, SPIE (2019). https://doi.org/10.1117/12.2526067
Lin, T., Maire, M., Belongie, S.J., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, 6–12 September 2014, Proceedings, Part V, pp. 740–755 (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.A.: Playing atari with deep reinforcement learning. CoRR abs/1312.5602 (2013). http://arxiv.org/abs/1312.5602
Neumann, L., Karg, M., Zhang, S., Scharfenberger, C., Piegert, E., Mistr, S., Prokofyeva, O., Thiel, R., Vedaldi, A., Zisserman, A., Schiele, B.: Nightowls: a pedestrians at night dataset. In: Computer Vision - ACCV 2018 - 14th Asian Conference on Computer Vision, Perth, Australia, 2–6 December 2018, Revised Selected Papers, Part I, pp. 691–705 (2018). https://doi.org/10.1007/978-3-030-20887-5_43
OpenAI, Andrychowicz, M., Baker, B., Chociej, M., Józefowicz, R., McGrew, B., Pachocki, J.W., Pachocki, J., Petron, A., Plappert, M., Powell, G., Ray, A., Schneider, J., Sidor, S., Tobin, J., Welinder, P., Weng, L., Zaremba, W.: Learning dexterous in-hand manipulation. CoRR abs/1808.00177 (2018). http://arxiv.org/abs/1808.00177
Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, pp. 779–788 (2016). https://doi.org/10.1109/CVPR.2016.91
Redmon, J., Farhadi, A.: Yolov3: An incremental improvement. CoRR abs/1804.02767 (2018). http://arxiv.org/abs/1804.02767
Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The synthia dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3234–3243 (2016)
Google Scholar
Tremblay, J., Prakash, A., Acuna, D., Brophy, M., Jampani, V., Anil, C., To, T., Cameracci, E., Boochoon, S., Bircheld, S.: Training deep networks with synthetic data: bridging the reality gap by domain randomization. CoRR abs/1804.06516 (2018). http://arxiv.org/abs/1804.06516
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2242–2251. IEEE (2017)
Google Scholar

Download references

Acknowledgments

The reported study was funded by Russian Foundation for Basic Research (RFBR) according to the project N\(\mathrm {^{o}}\) 17-29-03185, and by the Russian Science Foundation (RSF) according to the research project N\(\mathrm {^{o}}\) 19-11-11008.

Author information

Authors and Affiliations

State Research Institute of Aviation Systems (GosNIIAS), Moscow, Russia
Vladimir V. Kniaz, Vladimir A. Knyaz, Vladimir Mizginov, Ares Papazyan, Nikita Fomin & Lev Grodzitsky
Moscow Institute of Physics and Technology (MIPT), Dolgoprudny, Russia
Vladimir V. Kniaz & Vladimir A. Knyaz

Authors

Vladimir V. Kniaz
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir A. Knyaz
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Mizginov
View author publications
You can also search for this author in PubMed Google Scholar
Ares Papazyan
View author publications
You can also search for this author in PubMed Google Scholar
Nikita Fomin
View author publications
You can also search for this author in PubMed Google Scholar
Lev Grodzitsky
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vladimir V. Kniaz .

Editor information

Editors and Affiliations

Scientific Research Institute for System Analysis, Russian Academy of Sciences, Moscow, Russia
Boris Kryzhanovsky
Scientific Research Institute for System Analysis, Russian Academy of Sciences, Moscow, Russia
Witali Dunin-Barkowski
Scientific Research Institute for System Analysis, Russian Academy of Sciences, Moscow, Russia
Vladimir Redko
Moscow Aviation Institute (National Research University), Moscow, Russia
Yury Tiumentsev

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kniaz, V.V., Knyaz, V.A., Mizginov, V., Papazyan, A., Fomin, N., Grodzitsky, L. (2021). Adversarial Dataset Augmentation Using Reinforcement Learning and 3D Modeling. In: Kryzhanovsky, B., Dunin-Barkowski, W., Redko, V., Tiumentsev, Y. (eds) Advances in Neural Computation, Machine Learning, and Cognitive Research IV. NEUROINFORMATICS 2020. Studies in Computational Intelligence, vol 925. Springer, Cham. https://doi.org/10.1007/978-3-030-60577-3_38

Download citation

DOI: https://doi.org/10.1007/978-3-030-60577-3_38
Published: 02 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60576-6
Online ISBN: 978-3-030-60577-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics