Skip to main content

Generating Single Subject Activity Videos as a Sequence of Actions Using 3D Convolutional Generative Adversarial Networks

  • Conference paper
  • First Online:
Artificial General Intelligence (AGI 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10414))

Included in the following conference series:

  • 1928 Accesses

Abstract

Humans have the remarkable ability of imagination, where within the human mind virtual simulations are done of scenarios whether visual, auditory or any other senses. These imaginations are based on the experiences during interaction with the real world, where human senses help the mind understand their surroundings. Such level of imagination has not yet been achieved using current algorithms, but a current trend in deep learning architectures known as Generative Adversarial Networks (GANs) have proven capable of generating new and interesting images or videos based on the training data. In that way, GANs can be used to mimic human imagination, where the resulting generated visuals of GANs are based on the data used during training. In this paper, we use a combination of Long Short-Term Memory (LSTM) Networks and 3D GANs to generate videos. We use a 3D Convolutional GAN to generate new human action videos based on trained data. The generated human action videos are used to generate longer videos consisting of a sequence of short actions combined creating longer and more complex activities. To generate the sequence of actions needed we use an LSTM network to translate a simple input description text into the required sequence of actions. The generated chunks are then concatenated using a motion interpolation scheme to form a single video consisting of many generated actions. Hence a visualization of the input text description is generated as a video of a subject performing the activity described.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Goertzel, B.: AGI Revolution: An Inside View of the Rise of Artificial General Intelligence. Humanity Press, San Jose (2016)

    Google Scholar 

  2. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial networks. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)

    Google Scholar 

  3. Goodfellow, I.: NIPS 2016 tutorial: generative adversarial networks. arXiv preprint arXiv:1701.00160 (2016)

  4. Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: Proceedings of The 33rd International Conference on Machine Learning, vol. 3, May 2016

    Google Scholar 

  5. Denton, E.L., Chintala, S., Fergus, R.: Deep generative image models using a Laplacian pyramid of adversarial networks. In: Advances in Neural Information Processing Systems, pp. 1486–1494 (2015)

    Google Scholar 

  6. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)

  7. Vondrick, C., Pirsiavash, H., Torralba, A.: Generating videos with scene dynamics. In: Advances in Neural Information Processing Systems, pp. 613–621 (2016)

    Google Scholar 

  8. Mathieu, M., Couprie, C., LeCun, Y.: Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440

  9. Finn, C., Goodfellow, I., Levine, S.: Unsupervised learning for physical interaction through video prediction. In: Advances in Neural Information Processing Systems, pp. 64–72 (2016)

    Google Scholar 

  10. Wu, J., Zhang, C., Xue, T., Freeman, B., Tenenbaum, J.: Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In: Advances in Neural Information Processing Systems, pp. 82–90 (2016)

    Google Scholar 

  11. Odena, A., Olah, C., Shlens, J.: Conditional image synthesis with auxiliary classifier GANs. arXiv preprint arXiv:1610.09585 (2016)

  12. Koo, S.: Automatic colorization with deep convolutional generative adversarial networks (2016). http://cs231n.stanford.edu/reports/2016/pdfs/224_Report.pdf

  13. Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: Advances in Neural Information Processing Systems, pp. 2226–2234 (2016)

    Google Scholar 

  14. Odena, A.: Semi-supervised learning with generative adversarial networks. arXiv preprint arXiv:1606.01583 (2016)

  15. Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)

    Article  Google Scholar 

  16. Hoyes, K.A.: 3D simulation: the key to AI. In: Goertzel, B., Pennachin, C. (eds.) Artificial General Intelligence, pp. 353–387. Springer, Heidelberg (2007)

    Google Scholar 

  17. Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. IEEE Trans. Pattern Anal. Mach. Intell. 29(12), 2247–2253 (2007)

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by Center of Excellence for Higher Education Research Grant funded by Indonesian Ministry of Research and Higher Education. Contract No. 2626/UN2.R3.1/HKP05.00/2017. This paper is also supported by GPU grant from NVIDIA.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmad Arinaldi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Arinaldi, A., Fanany, M.I. (2017). Generating Single Subject Activity Videos as a Sequence of Actions Using 3D Convolutional Generative Adversarial Networks. In: Everitt, T., Goertzel, B., Potapov, A. (eds) Artificial General Intelligence. AGI 2017. Lecture Notes in Computer Science(), vol 10414. Springer, Cham. https://doi.org/10.1007/978-3-319-63703-7_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63703-7_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63702-0

  • Online ISBN: 978-3-319-63703-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics