Skip to main content

Adversarial Dataset Augmentation Using Reinforcement Learning and 3D Modeling

  • Conference paper
  • First Online:
Advances in Neural Computation, Machine Learning, and Cognitive Research IV (NEUROINFORMATICS 2020)

Part of the book series: Studies in Computational Intelligence ((SCI,volume 925))

Included in the following conference series:

Abstract

An extensive and diverse dataset is a crucial requirement for the successful training of a deep neural network. Compared to on-site data collection, 3D modeling allows to generate large datasets faster and cheaper. Still, the diversity and perceptual realism of synthetic images remain in the realm of a 3D artist’s experience. Moreover, hard sample mining with 3D modeling poses an open question: which synthetic images are challenging for an object detection model? We present an Adversarial 3D modeling framework for training an object detection model against a reinforcement learning-based adversarial controller. The controller alters the 3D simulator parameters to generate complex synthetic images. The controller aims to minimize the score of the object detection model during the training time. We hypothesize that such an objective of the controller allows to maximize the score of the detection model during inference on real-world data. We evaluate our approach by training a YOLOv3 object detection model using our adversarial framework. A comparison with a similar model trained on random synthetic and real images proves that our framework allows us to achieve better performance than using random real or synthetic data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  2. Dosovitskiy, A., Ros, G., Codevilla, F., LĂ³pez, A., Koltun, V.: CARLA: an open urban driving simulator. CoRR abs/1711.03938 (2017). http://arxiv.org/abs/1711.03938

  3. Gaidon, A., Wang, Q., Cabon, Y., Vig, E.: Virtual worlds as proxy for multi-object tracking analysis. CoRR abs/1605.06457 (2016). http://arxiv.org/abs/1605.06457

  4. Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A.C., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, Montreal, Quebec, Canada, 8–13 December 2014, pp. 2672–2680 (2014). http://papers.nips.cc/paper/5423-generative-adversarial-nets

  5. Ha, D., Schmidhuber, J.: Recurrent world models facilitate policy evolution. CoRR abs/1809.01999 (2018). http://arxiv.org/abs/1809.01999

  6. Hausknecht, M.J., Stone, P.: Deep recurrent q-learning for partially observable MDPs. CoRR abs/1507.06527 (2015). http://arxiv.org/abs/1507.06527

  7. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  8. Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5967–5976. IEEE (2017)

    Google Scholar 

  9. Jocher, G., guigarfr, perry0418, Ttayu, Veitch-Michaelis, J., Bianconi, G., Baltaci, F., Suess, D., WannaSeaU, IlyaOvodov: ultralitycs/yolov3: Rectangular Inference, Conv2d + Batchnorm2d Layer Fusion (2019). https://doi.org/10.5281/zenodo.2672652

  10. Kalashnikov, D., Irpan, A., Pastor, P., Ibarz, J., Herzog, A., Jang, E., Quillen, D., Holly, E., Kalakrishnan, M., Vanhoucke, V., Levine, S.: QT-Opt: scalable deep reinforcement learning for vision-based robotic manipulation. CoRR abs/1806.10293 (2018). http://arxiv.org/abs/1806.10293

  11. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. CoRR abs/1812.04948 (2018). http://arxiv.org/abs/1812.04948

  12. Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN (2019)

    Google Scholar 

  13. Kniaz, V.V., Knyaz, V., Remondino, F.: The point where reality meets fantasy: mixed adversarial generators for image splice detection. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32, pp. 215–226. Curran Associates, Inc. (2019). http://papers.nips.cc/paper/8315-the-point-where-reality-meets-fantasy-mixed-adversarial-generators.for-image-splice-detection.pdf

  14. Kniaz, V.V., Knyaz, V.A., Hladůvka, J., Kropatsch, W.G., Mizginov, V.: Thermalgan: Multimodal color-to-thermal image translation for person re-identification in multispectral dataset. In: Leal-TaixĂ©, L., Roth, S. (eds.) Computer Vision – ECCV 2018 Workshops, pp. 606–624. Springer, Cham (2019). https://link.springer.com/chapter/10.1007/978-3-030-11024-6_46

  15. Kniaz, V.V., Moshkantsev, P.V., Mizginov, V.A.: Deep learning a single photo voxel model prediction from real and synthetic images. In: Kryzhanovsky, B., Dunin-Barkowski, W., Redko, V., Tiumentsev, Y. (eds.) Advances in Neural Computation, Machine Learning, and Cognitive Research III, pp. 3–16. Springer International Publishing, Cham (2020)

    Google Scholar 

  16. Knyaz, V.: Multimodal data fusion for object recognition. In: Stella, E. (ed.) Multimodal Sensing: Technologies and Applications, vol. 11059, pp. 198–209. International Society for Optics and Photonics, SPIE (2019). https://doi.org/10.1117/12.2526067

  17. Lin, T., Maire, M., Belongie, S.J., Hays, J., Perona, P., Ramanan, D., DollĂ¡r, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, 6–12 September 2014, Proceedings, Part V, pp. 740–755 (2014). https://doi.org/10.1007/978-3-319-10602-1_48

  18. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.A.: Playing atari with deep reinforcement learning. CoRR abs/1312.5602 (2013). http://arxiv.org/abs/1312.5602

  19. Neumann, L., Karg, M., Zhang, S., Scharfenberger, C., Piegert, E., Mistr, S., Prokofyeva, O., Thiel, R., Vedaldi, A., Zisserman, A., Schiele, B.: Nightowls: a pedestrians at night dataset. In: Computer Vision - ACCV 2018 - 14th Asian Conference on Computer Vision, Perth, Australia, 2–6 December 2018, Revised Selected Papers, Part I, pp. 691–705 (2018). https://doi.org/10.1007/978-3-030-20887-5_43

  20. OpenAI, Andrychowicz, M., Baker, B., Chociej, M., JĂ³zefowicz, R., McGrew, B., Pachocki, J.W., Pachocki, J., Petron, A., Plappert, M., Powell, G., Ray, A., Schneider, J., Sidor, S., Tobin, J., Welinder, P., Weng, L., Zaremba, W.: Learning dexterous in-hand manipulation. CoRR abs/1808.00177 (2018). http://arxiv.org/abs/1808.00177

  21. Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, pp. 779–788 (2016). https://doi.org/10.1109/CVPR.2016.91

  22. Redmon, J., Farhadi, A.: Yolov3: An incremental improvement. CoRR abs/1804.02767 (2018). http://arxiv.org/abs/1804.02767

  23. Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The synthia dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3234–3243 (2016)

    Google Scholar 

  24. Tremblay, J., Prakash, A., Acuna, D., Brophy, M., Jampani, V., Anil, C., To, T., Cameracci, E., Boochoon, S., Bircheld, S.: Training deep networks with synthetic data: bridging the reality gap by domain randomization. CoRR abs/1804.06516 (2018). http://arxiv.org/abs/1804.06516

  25. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2242–2251. IEEE (2017)

    Google Scholar 

Download references

Acknowledgments

The reported study was funded by Russian Foundation for Basic Research (RFBR) according to the project N\(\mathrm {^{o}}\) 17-29-03185, and by the Russian Science Foundation (RSF) according to the research project N\(\mathrm {^{o}}\) 19-11-11008.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vladimir V. Kniaz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kniaz, V.V., Knyaz, V.A., Mizginov, V., Papazyan, A., Fomin, N., Grodzitsky, L. (2021). Adversarial Dataset Augmentation Using Reinforcement Learning and 3D Modeling. In: Kryzhanovsky, B., Dunin-Barkowski, W., Redko, V., Tiumentsev, Y. (eds) Advances in Neural Computation, Machine Learning, and Cognitive Research IV. NEUROINFORMATICS 2020. Studies in Computational Intelligence, vol 925. Springer, Cham. https://doi.org/10.1007/978-3-030-60577-3_38

Download citation

Publish with us

Policies and ethics