Iterative Imputation of Missing Data Using Auto-Encoder Dynamics

Śmieja, Marek; Kołomycki, Maciej; Struski, Łukasz; Juda, Mateusz; Figueiredo, Mário A. T.

doi:10.1007/978-3-030-63836-8_22

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12534))

Included in the following conference series:

International Conference on Neural Information Processing

2155 Accesses
5 Citations
1 Altmetric

Abstract

This paper introduces an approach to missing data imputation based on deep auto-encoder models, adequate to high-dimensional data exhibiting complex dependencies, such as images. The method exploits the properties of the vector field associated to an auto-encoder, which allows to approximate the gradient of the log-density from its reconstruction error, based on which we propose a projected gradient ascent algorithm to obtain the conditionally most probable estimate of the missing values. Our approach does not require any specialized training procedure and can be used together with any auto-encoder model trained on complete data in a classical way. Experiments performed on benchmark datasets show that imputations produced by our model are sharp and realistic.

The is the extended version of an extended abstract [25] presented at the ICLR Workshop on the Integration of Deep Neural Models and Differential Equations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
A GMM can also be learned from incomplete data, but the imputation process does not change.
2.
For a comparison between different auto-encoder models in the proposed procedure the reader is referred to our workshop paper [25].

References

Alain, G., Bengio, Y.: What regularized auto-encoders learn from the data-generating distribution. J. Mach. Learn. Res. 15, 3563–3593 (2014)
MathSciNet MATH Google Scholar
Azur, M., Stuart, E., Frangakis, C., Leaf, P.: Multiple imputation by chained equations: what is it and how does it work? Int. J. Methods Psychiatr. Res. 20, 40–49 (2011)
Article Google Scholar
Batista, G., Monard, M.: A study of k-nearest neighbour as an imputation method. Front. Artif. Intell. Appl. 97, 251–260 (2002)
Google Scholar
Buuren, S., Groothuis-Oudshoorn, K.: Mice: multivariate imputation by chained equations in R. J. Stat. Softw. 45(3), 1–68 (2010)
Google Scholar
Camino, R., Hammerschmidt, C., State, R.: Improving missing data imputation with deep generative models. arXiv preprint arXiv:1902.10666 (2019)
Dinh, L., Krueger, D., Bengio, Y.: Nice: non-linear independent components estimation. arXiv preprint arXiv:1410.8516 (2014)
Gallinari, P., LeCun, Y., Thiria, S., Fogelman-Soulie, F.: Memoires associatives distribuees. In: COGNITIVA 87, Paris (1987)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
Google Scholar
Hwang, U., Jung, D., Yoon, S.: Hexagan: generative adversarial nets for real world classification. arXiv preprint arXiv:1902.09913 (2019)
Iizuka, S., Simo-Serra, E., Ishikawa, H.: Globally and locally consistent image completion. ACM Trans. Graph. (ToG) 36(4), 1–14 (2017)
Article Google Scholar
Kingma, D., Welling, M.: Auto-encoding variational Bayes. In: International Conference on Learning Representations (2014)
Google Scholar
LeCun, Y.: Modeles connexionistes de l’apprentissage. Ph.D. thesis, Ph.D. thesis, Université de Paris VI (1987)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)
Article Google Scholar
Li, S., Jiang, B., Marlin, B.: MisGAN: learning from incomplete data with generative adversarial networks. arXiv preprint arXiv:1902.09599 (2019)
Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: International Conference on Computer Vision (2015)
Google Scholar
Luo, Y., Cai, X., Zhang, Y., Xu, J., Xiaojie, Y.: Multivariate time series imputation with generative adversarial networks. In: Advances in Neural Information Processing Systems, pp. 1596–1607 (2018)
Google Scholar
Mattei, P.A., Frellsen, J.: Leveraging the exact likelihood of deep latent variable models. In: Advances in Neural Information Processing Systems, pp. 3855–3866 (2018)
Google Scholar
Mattei, P.A., Frellsen, J.: Miwae: Deep generative modelling and imputation of incomplete data sets. In: International Conference on Machine Learning, pp. 4413–4423 (2019)
Google Scholar
Nazabal, A., Olmos, P.M., Ghahramani, Z., Valera, I.: Handling incomplete heterogeneous data using vaes. Pattern Recogn. 107, 107501 (2020)
Article Google Scholar
Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., Efros, A.: Context encoders: feature learning by inpainting. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2536–2544 (2016)
Google Scholar
Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models. arXiv preprint arXiv:1401.4082 (2014)
Sai Hareesh, A., Chandrasekaran, V.: A novel color image inpainting guided by structural similarity index measure and improved color angular radial transform. In: International Conference on Image Processing, Computer Vision, & Pattern Recognition, pp. 544–550 (2010)
Google Scholar
Śmieja, M., Struski, Ł., Tabor, J., Zieliński, B., Spurek, P.: Processing of missing data by neural networks. In: Advances in Neural Information Processing Systems, pp. 2719–2729 (2018)
Google Scholar
Śmieja, M., Kołomycki, M., Struski, L., Juda, M., Figueiredo, M.A.T.: Can auto-encoders help with filling missing data? In: ICLR Workshop on Integration of Deep Neural Models and Differential Equations (DeepDiffEq), p. 6 (2020)
Google Scholar
Stagakis, N., Zacharaki, E.I., Moustakas, K.: Hierarchical image inpainting by a deep context encoder exploiting structural similarity and saliency criteria. In: Tzovaras, D., Giakoumis, D., Vincze, M., Argyros, A. (eds.) ICVS 2019. LNCS, vol. 11754, pp. 470–479. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34995-0_42
Chapter Google Scholar
Titterington, D., Sedransk, J.: Imputation of missing values using density estimation. Stat. Probab. Lett. 9(5), 411–418 (1989)
Article MathSciNet Google Scholar
Tolstikhin, I., Bousquet, O., Gelly, S., Schölkopf, B.: Wasserstein auto-encoders (2017). arXiv:1711.01558
Vincent, P.: A connection between score matching and denoising autoencoders. Neural Comput. 23(7), 1661–1674 (2011)
Article MathSciNet Google Scholar
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)
Yoon, J., Jordon, J., Van Der Schaar, M.: Gain: missing data imputation using generative adversarial nets. arXiv preprint arXiv:1806.02920 (2018)
Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., Huang, T.S.: Generative image inpainting with contextual attention. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5505–5514 (2018)
Google Scholar

Download references

Acknowledgements

The work of M. Śmieja was supported by the National Science Centre (Poland) grant no. 2018/31/B/ST6/00993. The work of Ł. Struski was supported by the National Science Centre (Poland) grant no. 2017/25/B/ST6/01271 as well as the Foundation for Polish Science Grant No. POIR.04.04.00-00-14DE/18-00 co-financed by the European Union under the European Regional Development Fund. The work of M. Juda was supported by the National Science Centre (Poland) grant no. 2014/14/A/ST1/00453 and 2015/19/D/ST6/01215.

Author information

Authors and Affiliations

Faculty of Mathematics and Computer Science, Jagiellonian University, Kraków, Poland
Marek Śmieja, Łukasz Struski & Mateusz Juda
Institute of Applied Informatics, Faculty of Mechanical Engineering, Cracow University of Technology, Kraków, Poland
Maciej Kołomycki
Instituto de Telecomunicações, Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal
Mário A. T. Figueiredo

Authors

Marek Śmieja
View author publications
You can also search for this author in PubMed Google Scholar
Maciej Kołomycki
View author publications
You can also search for this author in PubMed Google Scholar
Łukasz Struski
View author publications
You can also search for this author in PubMed Google Scholar
Mateusz Juda
View author publications
You can also search for this author in PubMed Google Scholar
Mário A. T. Figueiredo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marek Śmieja .

Editor information

Editors and Affiliations

Department of AI, Ping An Life, Shenzhen, China
Haiqin Yang
Faculty of Information Technology, King Mongkut's Institute of Technology Ladkrabang, Bangkok, Thailand
Kitsuchart Pasupa
City University of Hong Kong, Kowloon, Hong Kong
Andrew Chi-Sing Leung
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, Hong Kong
James T. Kwok
School of Information Technology, King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
Jonathan H. Chan
The Chinese University of Hong Kong, New Territories, Hong Kong
Irwin King

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Śmieja, M., Kołomycki, M., Struski, Ł., Juda, M., Figueiredo, M.A.T. (2020). Iterative Imputation of Missing Data Using Auto-Encoder Dynamics. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Lecture Notes in Computer Science(), vol 12534. Springer, Cham. https://doi.org/10.1007/978-3-030-63836-8_22

Download citation

DOI: https://doi.org/10.1007/978-3-030-63836-8_22
Published: 19 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63835-1
Online ISBN: 978-3-030-63836-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics