Skip to main content

HRTF Representation with Convolutional Auto-encoder

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2020)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11961))

Included in the following conference series:

Abstract

The head-related transfer function (HRTF) can be considered as some kind of filter that describes how a sound from an arbitrary spatial direction transfers to the listener’s eardrums. HRTF can be used to synthesize vivid virtual 3D sound that seems to come from any spatial location, which makes it play an important role in the 3D audio technology. However, the complexity and variation of auditory cues inherent in HRTF make it difficult to set up an accurate mathematical model with the conventional methods. In this paper, we put forward an HRTF representation modeling based on convolutional auto-encoder (CAE), which is some type of auto-encoder that contains convolutional layers in the encoder part and deconvolution layers in the decoder part. The experimental evaluation on the ARI HRTF database shows that the proposed model provides very good results on dimensionality reduction of HRTF.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ari hrtf database homepage. http://www.kfs.oeaw.ac.at/hrtf. Accessed 4 July 2019

  2. Baumgartner, R., Majdak, P., Laback, B.: Modeling sound-source localization in sagittal planes for human listeners. J. Acoust. Soc. Am. 140(4), 2456 (2016). https://doi.org/10.1121/1.4964753

    Article  Google Scholar 

  3. Blommer, M., Wakefield, G.: Pole-zero approximations for head-related transfer functions using a logarithmic error criterion. IEEE Trans. Speech Audio Process. 5(3), 278–287 (1997)

    Article  Google Scholar 

  4. Chen, M.C., Hsieh, S.F.: Common acoustical-poles/zeros modeling for 3D sound processing. In: Proceedings of the 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 785–788. IEEE Signal Processing Society (2000)

    Google Scholar 

  5. Fink, K.J., Ray, L.: Individualization of head related transfer functions using principal component analysis. Appl. Acoust. 87, 162–173 (2015)

    Article  Google Scholar 

  6. Grais, E.M., Plumbley, M.D.: Single channel audio source separation using convolutional denoising autoencoders. In: 2017 IEEE Global Conference on Signal and Information Processing (GLOBALSIP 2017), pp. 1265–1269. IEEE (2017). https://doi.org/10.1109/GlobalSIP.2017.8309164

  7. Grijalva, F., Martini, L., Florencio, D., Goldenstein, S.: A manifold learning approach for personalizing HRTFs from anthropometric features. IEEE/ACM Trans. Audio Speech Lang. Process. 24(3), 559–570 (2016)

    Article  Google Scholar 

  8. Grijalva, F., Martini, L.C., Florencio, D., Goldenstein, S.: Interpolation of head-related transfer functions using manifold learning. IEEE Signal Process. Lett. 24(2), 221–225 (2017)

    Article  Google Scholar 

  9. Grijalva, F., Martini, L.C., Masiero, B., Goldenstein, S.: A recommender system for improving median plane sound localization performance based on a nonlinear representation of HRTFs. IEEE Access 6, 24829–24836 (2018)

    Article  Google Scholar 

  10. Haneda, Y., Makino, S., Kaneda, Y., Kitawaki, N.: Common-acoustical-pole and zero modeling of head-related transfer functions. IEEE Trans. Speech Audio Process. 7(2), 188–196 (1999)

    Article  Google Scholar 

  11. Hugeng, Gunawan, D., Wahab, W.: Effective preprocessing in modeling head-related impulse responses based on principal components analysis. Sig. Process. Int. J. 4(4), 201–212 (2010)

    Google Scholar 

  12. Iwaya, Y., Sato, W., Okamoto, T., Otani, M., Suzuki, Y.: Interpolation method of head-related transfer functions in the z-plane domain using a common-pole and zero model. In: 20th International Congress on Acoustics 2010, ICA 2010, Sydney, NSW, Australia, vol. 4, pp. 2936–2940 (2010)

    Google Scholar 

  13. Kistler, D.J., Wightman, F.L.: A model of head-related transfer-functions based on principal components-analysis and minimum-phase reconstruction. J. Acoust. Soc. Am. 91(3), 1637–1647 (1992)

    Article  Google Scholar 

  14. Kulkarni, A., Colburn, H.S.: Infinite-impulse-response models of the head-related transfer function. J. Acoust. Soc. Am. 115, 1714–1728 (2004)

    Article  Google Scholar 

  15. Liu, C.J., Hsieh, S.F.: Common-acoustic-poles/zeros approximation of head-related transfer functions. In: 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 3341–3344. IEEE Signal Processing Society (2001)

    Google Scholar 

  16. Mackenzie, J., Huopaniemi, J., Valimaki, V., Kale, I.: Low-order modeling of head-related transfer functions using balanced model truncation. IEEE Signal Process. Lett. 4(2), 39–41 (1997)

    Article  Google Scholar 

  17. Majdak, P., Goupell, M.J., Laback, B.: 3-D localization of virtual sound sources: effects of visual environment, pointing method, and training. Atten. Percept. Psychophys. 72(2), 454–469 (2010)

    Article  Google Scholar 

  18. Martens, W.L.: Principal components analysis and resynthesis of spectral cues to perceived direction. In: Proceedings of the International Computer Music Conference, Champaine-Urbana, IL (1987)

    Google Scholar 

  19. Meng, L., Wang, X., Chen, W., Ai, C., Hu, R.: Individualization of head related transfer functions based on radial basis function neural network. In: 2018 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2018). https://doi.org/10.1109/ICME.2018.8486494

  20. Middlebrooks, J.C.: Individual differences in external-ear transfer functions reduced by scaling in frequency. J. Acoust. Soc. Am. 106(3), 1480–1492 (1999)

    Article  Google Scholar 

  21. Ming, X., Binzhou, Y., Shuxia, G., Ying, G.: Head-related transfer function individualization based on locally linear embedding. In: Qiao, F., Patnaik, S., Wang, J. (eds.) ICMIR 2017. AISC, vol. 690, pp. 104–111. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-65978-7_16

    Chapter  Google Scholar 

  22. Turchenko, V., Chalmers, E., Luczak, A.: A deep convolutional auto-encoder with pooling – unpooling layers in caffe. Int. J. Comput. 18(1), 8–31 (2019). http://www.computingonline.net/computing/article/view/1270

  23. Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)

    MathSciNet  MATH  Google Scholar 

  24. Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2528–2535. IEEE Computer Society (2010). https://doi.org/10.1109/CVPR.2010.5539957

Download references

Acknowledgment

This work is supported by the National Key R&D Program of China (No. 2017YFB1002803), National Nature Science Foundation of China (No. 61701194, No. U1736206).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruimin Hu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, W., Hu, R., Wang, X., Li, D. (2020). HRTF Representation with Convolutional Auto-encoder. In: Ro, Y., et al. MultiMedia Modeling. MMM 2020. Lecture Notes in Computer Science(), vol 11961. Springer, Cham. https://doi.org/10.1007/978-3-030-37731-1_49

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-37731-1_49

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-37730-4

  • Online ISBN: 978-3-030-37731-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics