Estimation of spherical harmonic coefficients in sound field recording using feed-forward neural networks

Zhang, Lingkun; Wang, Xiaochen; Hu, Ruimin; Li, Dengshi; Tu, Weipin

doi:10.1007/s11042-020-09979-z

Estimation of spherical harmonic coefficients in sound field recording using feed-forward neural networks

Published: 13 October 2020

Volume 80, pages 6187–6202, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Lingkun Zhang^1,2,
Xiaochen Wang ORCID: orcid.org/0000-0002-1904-2097^1,2,
Ruimin Hu^1,2,
Dengshi Li^1,3 &
…
Weipin Tu^1,3

621 Accesses
4 Citations
Explore all metrics

Abstract

Sound field recording using spherical harmonics (SH) has been widely used. However, too many microphones are needed when recording sound fields over large areas, due to the capture of the higher order of spherical harmonic coefficients. The theory of GO in deep learning inspired us. With training the data much less than all GO’s legal positions data, the Alpha Go has defeated top GO players. According to the information learned from a specific dataset, the higher spherical harmonics coefficients may be estimated with few captured sound pressures. In this paper, a learning-based approach for estimation of the SH coefficients has been investigated. In the proposed approach, SH coefficients are estimated with a feed-forward neural network (FNN) based on measurements of a spherical array. We generate a uniformly distributed dataset, try to evaluate the method on an average situation. Moreover, with the real sound field data in the SOFiA dataset, we try to evaluate the performance of our method when the correlations of data are weak. Experimental results show that the proposed approach achieves higher estimation accuracy of SH coefficients than a previously reported method. In simulations, 9 microphones’ performance using the proposed approach can approximate an array with 16 microphones. The experiments confirmed the feasibility of estimating the SH coefficients with the data-driven method. Thus in a specific application, it can be used to reduce the required number of microphones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Synthesis of soundfields through irregular loudspeaker arrays based on convolutional neural networks

Article Open access 28 March 2024

Optimization of sound fields reproduction based Higher-Order Ambisonics (HOA) using the Generative Adversarial Network (GAN)

Article 12 September 2020

Phased microphone array for sound source localization with deep learning

Article 14 May 2019

References

Abhayapala T, Gupta A (2010) Spherical harmonic analysis of wavefields using multiple circular sensor arrays. IEEE Transactions on Audio Speech & Language Processing 18(6):1655–1666
Article Google Scholar
Abhayapala T, Ward DB (2002) Theory and design of high order sound field microphones using spherical microphone array. In: IEEE International conference on acoustics, speech, and signal processing, pp II–1949–II–1952
Alon DL, Rafaely B (2017) Beamforming with optimal aliasing cancellation in spherical microphone arrays. IEEE/ACM Transactions on Audio Speech & Language Processing 24(1):196–210
Article Google Scholar
Bishop CM (2006) Pattern recognition and machine learning. Springer
Chang J, Marschall M (2018) Periphony-lattice mixed-order ambisonic scheme for spherical microphone arrays. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 26(5):924–936
Article Google Scholar
Chen H, Abhayapala T, Zhang W (2015) 3d sound field analysis using circular higher-order microphone array. In: 2015 23rd European signal processing conference (EUSIPCO). IEEE, pp 1153–1157
Chen H, Abhayapala T, Zhang W (2015) Theory and design of compact hybrid microphone arrays on two-dimensional planes for three-dimensional soundfield analysis. J Acoust Soc Am 138(5):3081
Article Google Scholar
Chollet F et al (2015) Keras. https://github.com/fchollet/keras
Epain N, Jin CT, Epain N, Jin CT, Epain N, Jin CT (2016) Spherical harmonic signal covariance and sound field diffuseness. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 24(10):1796–1807
Article Google Scholar
Fahim A, Samarasinghe PN, Abhayapala T (2017) Sound field separation in a mixed acoustic environment using a sparse array of higher order spherical microphones. In: Hands-free speech communications and microphone arrays
Fliege J Integration nodes for the sphere. http://www.personal.soton.ac.uk/jf1w07/nodes/nodes.html
Gerzon MA (1973) Periphony: with-height sound reproduction. J Audio Eng Soc 21(1):2–10
Google Scholar
Gerzon MA (1985) Ambisonics in multichannel broadcasting and video. J Audio Eng Soc 33(11):859–871
Google Scholar
Gupta A, Abhayapala T (2010) Double sided cone array for spherical harmonic analysis of wavefields. In: IEEE International conference on acoustics speech and signal processing, pp 77–80
Hohnerlein C, Ahrens J (2017) Spherical microphone array processing in python with the sound field analysis-py toolbox. Proc of DAGA, Kiel Germany
Iizuka S, Simo-Serra E, Ishikawa H (2017) Globally and locally consistent image completion. ACM Trans Graph (ToG) 36(4):107
Article Google Scholar
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arxiv, pp 448–456
Jin CT, Epain N, Parthy A (2013) Design, optimization and evaluation of a dual-radius spherical microphone array. IEEE/ACM Transactions on Audio, Speech, and Language Processing 22(1):193–204
Article Google Scholar
Kennedy RA, Sadeghi, Abhayapala T, Jones HM (2007) Intrinsic limits of dimensionality and richness in random multipath fields. IEEE Transactions on Signal Processing 55(6):2542–2556
Article MathSciNet Google Scholar
Kingma D, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Koyama S, Furuya K, Wakayama K, Shimauchi S, Saruwatari H (2016) Analytical approach to transforming filter design for sound field recording and reproduction using circular arrays with a spherical baffle. J Acoust Soc Am 139(3):1024
Article Google Scholar
Kumar L, Hegde RM (2016) Near-field acoustic source localization and beamforming in spherical harmonics domain. IEEE Transactions on Signal Processing 64(13):3351–3361
Article MathSciNet Google Scholar
Miller E, Rafaely B (2019) The role of direct sound spherical harmonics representation in externalization using binaural reproduction. Appl Acoust 148:40–45
Article Google Scholar
Okamoto T (2019) Horizontal 3d sound field recording and 2.5 d synthesis with omni-directional circular arrays. In: ICASSP 2019-2019 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 960–964
Park M, Rafaely B (2005) Sound-field analysis by plane-wave decomposition using spherical microphone array. J Acoust Soc Am 118(5):3094–3103
Article Google Scholar
Poletti MA (2005) Three-dimensional surround sound systems based on spherical harmonics. J Audio Eng Soc 53(11):1004–1025
Google Scholar
Pomberger H, Pausch F (2014) Design and evaluation of a spherical segment array with double cone. Acta Acustica United with Acustica 100(5):921–927
Article Google Scholar
Rafaely B (2005) Analysis and design of spherical microphone arrays. IEEE Transactions on Speech and Audio Processing 13(1):135–143
Article Google Scholar
Samarasinghe PN, Abhayapala T (2017) Blind estimation of directional properties of room reverberation using a spherical microphone array. In: IEEE International conference on acoustics, speech and signal processing
Samarasinghe PN, Abhayapala T, Chen H (2017) Estimating the direct-to-reverberant energy ratio using a spherical harmonics-based spatial correlation model. IEEE/ACM Transactions on Audio, Speech, and Language Processing 25(2):310–319
Article Google Scholar
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489
Article Google Scholar
Sun Y, Chen J, Yuen C, Rahardja S (2017) Indoor sound source localization with probabilistic neural network. IEEE Trans Ind Electron 65(8):6403–6413
Article Google Scholar
Tromp J Number of legal go positions. https://tromp.github.io/go/legal.html
Ueno N, Koyama S, Saruwatari H (2018) Sound field recording using distributed microphones based on harmonic analysis of infinite order. IEEE Signal Processing Letters 25(1):135–139
Article Google Scholar
Wakayama K, Trevino J, Takada H, Sakamoto S, Suzuki Y (2017) Extended sound field recording using position information of directional sound sources. In: 2017 IEEE Workshop on applications of signal processing to audio and acoustics (WASPAA). IEEE, pp 185–189
Ward DB, Abhayapala T (2001) Reproduction of a plane-wave sound field using an array of loudspeakers. IEEE Transactions on Speech and Audio Processing 9(6):697–707. 10.1109/89.943347
Article Google Scholar
Williams EG (1999) Fourier acoustics: sound radiation and nearfield acoustical holography. Academic Press
Zhang W, Samarasinghe P, Chen H, Abhayapala T (2017) Surround by sound: a review of spatial audio recording and reproduction. Appl Sci 7 (5):532
Article Google Scholar
Zuo H, Samarasinghe PN, Abhayapala T (2018) Exterior-interior 3d sound field separation using a planar array of differential microphones. In: 2018 16th international workshop on acoustic signal enhancement (IWAENC). IEEE, pp 216–220

Download references

Acknowledgments

This research is partially supported by the National Key R&D Program of China (No. 2017YFB1002803), National Nature Science Foundation of China (No. U1736206, No. 61761044), Hubei Province Technological Innovation Major Project (No. 2017AAA123).

Author information

Authors and Affiliations

National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, 430072, China
Lingkun Zhang, Xiaochen Wang, Ruimin Hu, Dengshi Li & Weipin Tu
Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, Wuhan, 430072, China
Lingkun Zhang, Xiaochen Wang & Ruimin Hu
Collaborative Innovation Center of Geospatial Technology, Wuhan, 430079, China
Dengshi Li & Weipin Tu

Authors

Lingkun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaochen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ruimin Hu
View author publications
You can also search for this author in PubMed Google Scholar
Dengshi Li
View author publications
You can also search for this author in PubMed Google Scholar
Weipin Tu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaochen Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, L., Wang, X., Hu, R. et al. Estimation of spherical harmonic coefficients in sound field recording using feed-forward neural networks. Multimed Tools Appl 80, 6187–6202 (2021). https://doi.org/10.1007/s11042-020-09979-z

Download citation

Received: 12 October 2019
Revised: 18 July 2020
Accepted: 24 September 2020
Published: 13 October 2020
Issue Date: February 2021
DOI: https://doi.org/10.1007/s11042-020-09979-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimation of spherical harmonic coefficients in sound field recording using feed-forward neural networks

Abstract

Access this article

Similar content being viewed by others

Synthesis of soundfields through irregular loudspeaker arrays based on convolutional neural networks

Optimization of sound fields reproduction based Higher-Order Ambisonics (HOA) using the Generative Adversarial Network (GAN)

Phased microphone array for sound source localization with deep learning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Estimation of spherical harmonic coefficients in sound field recording using feed-forward neural networks

Abstract

Access this article

Similar content being viewed by others

Synthesis of soundfields through irregular loudspeaker arrays based on convolutional neural networks

Optimization of sound fields reproduction based Higher-Order Ambisonics (HOA) using the Generative Adversarial Network (GAN)

Phased microphone array for sound source localization with deep learning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation