Abstract
We present an approach to learn the dynamics of multiple objects from image sequences in an unsupervised way. We introduce a probabilistic model that first generate noisy positions for each object through a separate linear state-space model, and then renders the positions of all objects in the same image through a highly non-linear process. Such a linear representation of the dynamics enables us to propose an inference method that uses exact and efficient inference tools and that can be deployed to query the model in different ways without retraining.
Similar content being viewed by others
Notes
Whilst in practice we need to consider all observed sequences in the KL, to simplify the notation we focus the exposition on one sequence only.
In practice, as the state \(s_0^n\) encodes which way we can interrogate \(v_1\) to infer \(a_1^n\), we have obtained better results by learning separate \(\phi _{s_0^n}\) that depend on the number of objects N in the image.
References
Babaeizadeh, M., Finn, C., Erhan, D., Campbell, R., Levine, S.: Stochastic variational video prediction. In: 6th International Conference on Learning Representations (2018)
Bar-Shalom, Y., Li, X.R.: Estimation and Tracking: Principles, Techniques, and Software. Artech House, Norwood (1993)
Barber, D., Cemgil, A.T., Chiappa, S.: Inference and estimation in probabilistic time series models. In: Bayesian Time Series Models, pp. 1–31 (2011)
Blackman, S., Popoli, R.: Design and Analysis of Modern Tracking Systems. Artech House, Norwood (1999)
Chiappa, S.: Analysis and Classification of EEG Signals using Probabilistic Models for Brain Computer Interfaces. Ph.D. thesis, EPF Lausanne, Switzerland (2006)
Chiappa, S.: A Bayesian approach to switching linear Gaussian state-space models for unsupervised time-series segmentation. In: Proceedings of the Seventh International Conference on Machine Learning and Applications, pp. 3–9 (2008)
Chiappa, S.: Explicit-duration Markov switching models. Found. Trends Mach. Learn. 7(6), 803–886 (2014)
Chiappa, S., Racanière, S., Wierstra, D., Mohamed, S.: Recurrent environment simulators. In: 5th International Conference on Learning Representations (2017)
Denton, E.L., Birodkar, V.: Unsupervised learning of disentangled representations from video. Adv. Neural Inf. Process. Syst. 30, 4414–4423 (2017)
Finn, C., Goodfellow, I.J., Levine, S.: Unsupervised learning for physical interaction through video prediction. Adv. Neural Inf. Process. Syst. 29, 64–72 (2016)
Fraccaro, M., Kamronn, S., Paquet, U., Winther, O.: A disentangled recognition and nonlinear dynamics model for unsupervised learning. Adv. Neural Inf. Process. Syst. 30, 3604–3613 (2017)
Fraccaro, M., Sønderby, S.K., Paquet, U., Winther, O.: Sequential neural models with stochastic layers. Adv. Neural Inf. Process. Syst. 29, 2199–2207 (2016)
Gao, Y., Archer, E.W., Paninski, L., Cunningham, J.P.: Linear dynamical neural population models through nonlinear embeddings. Adv. Neural Inf. Process. Syst. 29, 163–171 (2016)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Johnson, M., Duvenaud, D.K., Wiltschko, A., Adams, R.P., Datta, S.R.: Composing graphical models with neural networks for structured representations and fast inference. Adv. Neural Inf. Process. Syst. 29, 2946–2954 (2016)
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: 2nd International Conference on Learning Representations (2014)
Krishnan, R., Shalit, U., Sontag, D.: Structured inference networks for nonlinear state space models. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pp. 2101–2109 (2017)
Lin, W., Hubacher, N., Khan, M.E.: Variational message passing with structured inference networks. In: 6th International Conference on Learning Representations (2018)
Oh, J., Guo, X., Lee, H., Lewis, R.L., Singh, S.: Action-conditional video prediction using deep networks in Atari games. Adv. Neural Inf. Process. Syst. 28, 2863–2871 (2015)
Pearce, M., Chiappa, S., Paquet, U.: Comparing interpretable inference models for videos of physical motion. In: Symposium on Advances in Approximate Bayesian Inference (2018)
Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models. In: Proceedings of the 31st International Conference on Machine Learning, pp. 1278–1286 (2014)
Srivastava, N., Mansimov, E., Salakhutdinov, R.: Unsupervised learning of video representations using LSTMs. In: Proceedings of the 32nd International Conference on Machine Learning, pp. 843–852 (2015)
Sun, W., Venkatraman, A., Boots, B., Bagnell, J.A.: Learning to filter with predictive state inference machines. In: Proceedings of the 32nd International Conference on Machine Learning, pp. 1197–1205 (2016)
Watters, N., Tacchetti, A., Weber, T., Pascanu, R., Battaglia, P., Zoran, D.: Visual interaction networks. CoRR. arXiv:1706.01433 (2017)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chiappa, S., Paquet, U. Unsupervised separation of dynamics from pixels. METRON 77, 119–135 (2019). https://doi.org/10.1007/s40300-019-00155-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40300-019-00155-4