Skip to main content

Iterative Imputation of Missing Data Using Auto-Encoder Dynamics

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2020)

Abstract

This paper introduces an approach to missing data imputation based on deep auto-encoder models, adequate to high-dimensional data exhibiting complex dependencies, such as images. The method exploits the properties of the vector field associated to an auto-encoder, which allows to approximate the gradient of the log-density from its reconstruction error, based on which we propose a projected gradient ascent algorithm to obtain the conditionally most probable estimate of the missing values. Our approach does not require any specialized training procedure and can be used together with any auto-encoder model trained on complete data in a classical way. Experiments performed on benchmark datasets show that imputations produced by our model are sharp and realistic.

The is the extended version of an extended abstract [25] presented at the ICLR Workshop on the Integration of Deep Neural Models and Differential Equations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    A GMM can also be learned from incomplete data, but the imputation process does not change.

  2. 2.

    For a comparison between different auto-encoder models in the proposed procedure the reader is referred to our workshop paper [25].

References

  1. Alain, G., Bengio, Y.: What regularized auto-encoders learn from the data-generating distribution. J. Mach. Learn. Res. 15, 3563–3593 (2014)

    MathSciNet  MATH  Google Scholar 

  2. Azur, M., Stuart, E., Frangakis, C., Leaf, P.: Multiple imputation by chained equations: what is it and how does it work? Int. J. Methods Psychiatr. Res. 20, 40–49 (2011)

    Article  Google Scholar 

  3. Batista, G., Monard, M.: A study of k-nearest neighbour as an imputation method. Front. Artif. Intell. Appl. 97, 251–260 (2002)

    Google Scholar 

  4. Buuren, S., Groothuis-Oudshoorn, K.: Mice: multivariate imputation by chained equations in R. J. Stat. Softw. 45(3), 1–68 (2010)

    Google Scholar 

  5. Camino, R., Hammerschmidt, C., State, R.: Improving missing data imputation with deep generative models. arXiv preprint arXiv:1902.10666 (2019)

  6. Dinh, L., Krueger, D., Bengio, Y.: Nice: non-linear independent components estimation. arXiv preprint arXiv:1410.8516 (2014)

  7. Gallinari, P., LeCun, Y., Thiria, S., Fogelman-Soulie, F.: Memoires associatives distribuees. In: COGNITIVA 87, Paris (1987)

    Google Scholar 

  8. Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)

    Google Scholar 

  9. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)

    Google Scholar 

  10. Hwang, U., Jung, D., Yoon, S.: Hexagan: generative adversarial nets for real world classification. arXiv preprint arXiv:1902.09913 (2019)

  11. Iizuka, S., Simo-Serra, E., Ishikawa, H.: Globally and locally consistent image completion. ACM Trans. Graph. (ToG) 36(4), 1–14 (2017)

    Article  Google Scholar 

  12. Kingma, D., Welling, M.: Auto-encoding variational Bayes. In: International Conference on Learning Representations (2014)

    Google Scholar 

  13. LeCun, Y.: Modeles connexionistes de l’apprentissage. Ph.D. thesis, Ph.D. thesis, Université de Paris VI (1987)

    Google Scholar 

  14. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)

    Article  Google Scholar 

  15. Li, S., Jiang, B., Marlin, B.: MisGAN: learning from incomplete data with generative adversarial networks. arXiv preprint arXiv:1902.09599 (2019)

  16. Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: International Conference on Computer Vision (2015)

    Google Scholar 

  17. Luo, Y., Cai, X., Zhang, Y., Xu, J., Xiaojie, Y.: Multivariate time series imputation with generative adversarial networks. In: Advances in Neural Information Processing Systems, pp. 1596–1607 (2018)

    Google Scholar 

  18. Mattei, P.A., Frellsen, J.: Leveraging the exact likelihood of deep latent variable models. In: Advances in Neural Information Processing Systems, pp. 3855–3866 (2018)

    Google Scholar 

  19. Mattei, P.A., Frellsen, J.: Miwae: Deep generative modelling and imputation of incomplete data sets. In: International Conference on Machine Learning, pp. 4413–4423 (2019)

    Google Scholar 

  20. Nazabal, A., Olmos, P.M., Ghahramani, Z., Valera, I.: Handling incomplete heterogeneous data using vaes. Pattern Recogn. 107, 107501 (2020)

    Article  Google Scholar 

  21. Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., Efros, A.: Context encoders: feature learning by inpainting. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2536–2544 (2016)

    Google Scholar 

  22. Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models. arXiv preprint arXiv:1401.4082 (2014)

  23. Sai Hareesh, A., Chandrasekaran, V.: A novel color image inpainting guided by structural similarity index measure and improved color angular radial transform. In: International Conference on Image Processing, Computer Vision, & Pattern Recognition, pp. 544–550 (2010)

    Google Scholar 

  24. Śmieja, M., Struski, Ł., Tabor, J., Zieliński, B., Spurek, P.: Processing of missing data by neural networks. In: Advances in Neural Information Processing Systems, pp. 2719–2729 (2018)

    Google Scholar 

  25. Śmieja, M., Kołomycki, M., Struski, L., Juda, M., Figueiredo, M.A.T.: Can auto-encoders help with filling missing data? In: ICLR Workshop on Integration of Deep Neural Models and Differential Equations (DeepDiffEq), p. 6 (2020)

    Google Scholar 

  26. Stagakis, N., Zacharaki, E.I., Moustakas, K.: Hierarchical image inpainting by a deep context encoder exploiting structural similarity and saliency criteria. In: Tzovaras, D., Giakoumis, D., Vincze, M., Argyros, A. (eds.) ICVS 2019. LNCS, vol. 11754, pp. 470–479. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34995-0_42

    Chapter  Google Scholar 

  27. Titterington, D., Sedransk, J.: Imputation of missing values using density estimation. Stat. Probab. Lett. 9(5), 411–418 (1989)

    Article  MathSciNet  Google Scholar 

  28. Tolstikhin, I., Bousquet, O., Gelly, S., Schölkopf, B.: Wasserstein auto-encoders (2017). arXiv:1711.01558

  29. Vincent, P.: A connection between score matching and denoising autoencoders. Neural Comput. 23(7), 1661–1674 (2011)

    Article  MathSciNet  Google Scholar 

  30. Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)

  31. Yoon, J., Jordon, J., Van Der Schaar, M.: Gain: missing data imputation using generative adversarial nets. arXiv preprint arXiv:1806.02920 (2018)

  32. Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., Huang, T.S.: Generative image inpainting with contextual attention. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5505–5514 (2018)

    Google Scholar 

Download references

Acknowledgements

The work of M. Śmieja was supported by the National Science Centre (Poland) grant no. 2018/31/B/ST6/00993. The work of Ł. Struski was supported by the National Science Centre (Poland) grant no. 2017/25/B/ST6/01271 as well as the Foundation for Polish Science Grant No. POIR.04.04.00-00-14DE/18-00 co-financed by the European Union under the European Regional Development Fund. The work of M. Juda was supported by the National Science Centre (Poland) grant no. 2014/14/A/ST1/00453 and 2015/19/D/ST6/01215.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marek Śmieja .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Śmieja, M., Kołomycki, M., Struski, Ł., Juda, M., Figueiredo, M.A.T. (2020). Iterative Imputation of Missing Data Using Auto-Encoder Dynamics. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Lecture Notes in Computer Science(), vol 12534. Springer, Cham. https://doi.org/10.1007/978-3-030-63836-8_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-63836-8_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-63835-1

  • Online ISBN: 978-3-030-63836-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics