Generating Single Subject Activity Videos as a Sequence of Actions Using 3D Convolutional Generative Adversarial Networks

Arinaldi, Ahmad; Fanany, Mohamad Ivan

doi:10.1007/978-3-319-63703-7_13

Ahmad Arinaldi¹⁶ &
Mohamad Ivan Fanany¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10414))

Included in the following conference series:

International Conference on Artificial General Intelligence

1928 Accesses

Abstract

Humans have the remarkable ability of imagination, where within the human mind virtual simulations are done of scenarios whether visual, auditory or any other senses. These imaginations are based on the experiences during interaction with the real world, where human senses help the mind understand their surroundings. Such level of imagination has not yet been achieved using current algorithms, but a current trend in deep learning architectures known as Generative Adversarial Networks (GANs) have proven capable of generating new and interesting images or videos based on the training data. In that way, GANs can be used to mimic human imagination, where the resulting generated visuals of GANs are based on the data used during training. In this paper, we use a combination of Long Short-Term Memory (LSTM) Networks and 3D GANs to generate videos. We use a 3D Convolutional GAN to generate new human action videos based on trained data. The generated human action videos are used to generate longer videos consisting of a sequence of short actions combined creating longer and more complex activities. To generate the sequence of actions needed we use an LSTM network to translate a simple input description text into the required sequence of actions. The generated chunks are then concatenated using a motion interpolation scheme to form a single video consisting of many generated actions. Hence a visualization of the input text description is generated as a video of a subject performing the activity described.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Goertzel, B.: AGI Revolution: An Inside View of the Rise of Artificial General Intelligence. Humanity Press, San Jose (2016)
Google Scholar
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial networks. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Google Scholar
Goodfellow, I.: NIPS 2016 tutorial: generative adversarial networks. arXiv preprint arXiv:1701.00160 (2016)
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: Proceedings of The 33rd International Conference on Machine Learning, vol. 3, May 2016
Google Scholar
Denton, E.L., Chintala, S., Fergus, R.: Deep generative image models using a Laplacian pyramid of adversarial networks. In: Advances in Neural Information Processing Systems, pp. 1486–1494 (2015)
Google Scholar
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)
Vondrick, C., Pirsiavash, H., Torralba, A.: Generating videos with scene dynamics. In: Advances in Neural Information Processing Systems, pp. 613–621 (2016)
Google Scholar
Mathieu, M., Couprie, C., LeCun, Y.: Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440
Finn, C., Goodfellow, I., Levine, S.: Unsupervised learning for physical interaction through video prediction. In: Advances in Neural Information Processing Systems, pp. 64–72 (2016)
Google Scholar
Wu, J., Zhang, C., Xue, T., Freeman, B., Tenenbaum, J.: Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In: Advances in Neural Information Processing Systems, pp. 82–90 (2016)
Google Scholar
Odena, A., Olah, C., Shlens, J.: Conditional image synthesis with auxiliary classifier GANs. arXiv preprint arXiv:1610.09585 (2016)
Koo, S.: Automatic colorization with deep convolutional generative adversarial networks (2016). http://cs231n.stanford.edu/reports/2016/pdfs/224_Report.pdf
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: Advances in Neural Information Processing Systems, pp. 2226–2234 (2016)
Google Scholar
Odena, A.: Semi-supervised learning with generative adversarial networks. arXiv preprint arXiv:1606.01583 (2016)
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)
Article Google Scholar
Hoyes, K.A.: 3D simulation: the key to AI. In: Goertzel, B., Pennachin, C. (eds.) Artificial General Intelligence, pp. 353–387. Springer, Heidelberg (2007)
Google Scholar
Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. IEEE Trans. Pattern Anal. Mach. Intell. 29(12), 2247–2253 (2007)
Article Google Scholar

Download references

Acknowledgements

This work is supported by Center of Excellence for Higher Education Research Grant funded by Indonesian Ministry of Research and Higher Education. Contract No. 2626/UN2.R3.1/HKP05.00/2017. This paper is also supported by GPU grant from NVIDIA.

Author information

Authors and Affiliations

Machine Learning and Computer Vision Laboratory, Faculty of Computer Science, Universitas Indonesia, Depok, Indonesia
Ahmad Arinaldi & Mohamad Ivan Fanany

Authors

Ahmad Arinaldi
View author publications
You can also search for this author in PubMed Google Scholar
Mohamad Ivan Fanany
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmad Arinaldi .

Editor information

Editors and Affiliations

Australian National University , Canberra, Aust Capital Terr, Australia
Tom Everitt
OpenCog Foundation , Hong Kong, China
Ben Goertzel
St. Petersburg State University , St. Petersburg, Russia
Alexey Potapov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Arinaldi, A., Fanany, M.I. (2017). Generating Single Subject Activity Videos as a Sequence of Actions Using 3D Convolutional Generative Adversarial Networks. In: Everitt, T., Goertzel, B., Potapov, A. (eds) Artificial General Intelligence. AGI 2017. Lecture Notes in Computer Science(), vol 10414. Springer, Cham. https://doi.org/10.1007/978-3-319-63703-7_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-63703-7_13
Published: 15 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63702-0
Online ISBN: 978-3-319-63703-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics