Speech Data Enhancement Based on Hybrid Neural Network

Cao, Xinyue; Sun, Xiao; Ren, Fuji

doi:10.1007/978-3-030-00764-5_33

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11166))

Included in the following conference series:

Pacific Rim Conference on Multimedia

3130 Accesses

Abstract

With the rapid development of artificial intelligence, the recognition of speech, text, physiological signals and facial expressions has drawn more and more attention from scholars at home and abroad. Therefore, we cannot just study the problems of one area, but more we look for the similarities across fields. In this paper, the method of image enhancement is adapted according to the speech characteristics, and several feasible methods of speech data enhancement are proposed to avoid the problems of data collection and corpus limitation in speech emotion recognition. Based on the Hybrid neural network (Convolution Neural Network, CNN and Recurrent neural network, RNN) model, the feasibility and performance of the method are verified and showed through several sets of comparative experiments in different methods while a high recognition accuracy is obtained.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Kandali, A.B., Routray, A., Basu, T.K.: Emotion recognition from Assamese speeches using MFCC features and GMM classifier. https://doi.org/10.1109/TENCON.2008.4766487
Chao, L., Tao, J., Yang, M., Li, Y.: Improving generation performance of speech emotion recognition by denoising autoencoders. In: International Symposium on Chinese Spoken Language Processing, pp. 341–344 (2014)
Google Scholar
Chen, L., Mao, X., Xue, Y., Cheng, L.L.: Speech emotion recognition: features and classification models. Digit. Signal Process. 22(6), 1154–1160 (2012). https://doi.org/10.1016/j.dsp.2012.05.007
Article MathSciNet Google Scholar
Fei, W., Ye, X., Sun, Z., Huang, Y., Zhang, X., Shang, S.: Research on speech emotion recognition based on deep auto-encoder. In: 2016 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems (CYBER), pp. 308–312. IEEE (2016)
Google Scholar
Huang, Z., Dong, M., Mao, Q., Zhan, Y.: Speech emotion recognition using CNN, pp. 801–804 (2014)
Google Scholar
Jin, Q., Li, C., Chen, S., Wu, H.: Speech emotion recognition with acoustic and lexical features. https://doi.org/10.1109/ICASSP.2015.7178872
Lim, W., Jang, D., Lee, T.: Speech emotion recognition using convolutional and recurrent neural networks. In: Signal and Information Processing Association Summit and Conference, pp. 1–4 (2017)
Google Scholar
Liu, Z.T., Wu, M., Cao, W.H., Mao, J.W., Xu, J.P., Tan, G.Z.: Speech emotion recognition based on feature selection and extreme learning machine decision tree. Neurocomputing 273, 271–280 (2018)
Article Google Scholar
Mao, Q., Dong, M., Huang, Z., Zhan, Y.: Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Trans. Multimed. 16(8), 2203–2213 (2014)
Article Google Scholar
Mariooryad, S., Busso, C.: Compensating for speaker or lexical variabilities in speech for emotion recognition. Speech Commun. 57(1), 1–12 (2014)
Article Google Scholar
Mu, Y., Gómez, L.A.H., Montes, A.C., Martínez, C.A., Wang, X., Gao, H.: Speech emotion recognition using convolutional-recurrent neural networks with attention model. DEStech Transactions on Computer Science and Engineering (CII) (2017)
Google Scholar
Trigeorgis, G., et al.: Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In: IEEE International Conference on Acoustics, Speech and Signal Processing (2016)
Google Scholar
Zhou, Y., Sun, Y., Zhang, J., Yan, Y.: Speech emotion recognition using both spectral and prosodic features. In: International Conference on Information Engineering and Computer Science, pp. 1–4 (2009)
Google Scholar
Zhou, Y., Sun, Y., Zhang, J., Yan, Y.: Speech emotion recognition using both spectral and prosodic features. https://doi.org/10.1109/ICIECS.2009.5362730

Download references

Acknowledgment

The work is supported by the State Key Program of National Natural Science of China (61432004, 71571058, 61461045). This work was partially supported by the China Postdoctoral Science Foundation funded project (2017T100447). This research has been partially supported by National Natural Science Foundation of China under Grant No. 61472117. This work is also supported by the foundational application research of Qinghai Province Science and Technology Fund (No. 2016-ZJ-743).

Author information

Authors and Affiliations

School of Computer and Information, HeFei University of Technology, Hefei, Anhui, China
Xinyue Cao, Xiao Sun & Fuji Ren
School of Computer and Information, Tokushima University, Tokushima, Japan
Fuji Ren

Authors

Xinyue Cao
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Sun
View author publications
You can also search for this author in PubMed Google Scholar
Fuji Ren
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiao Sun .

Editor information

Editors and Affiliations

Hefei University of Technology, Hefei, China
Richang Hong
National Chiao Tung University, Hsinchu, Taiwan
Wen-Huang Cheng
University of Tokyo, Tokyo, Japan
Toshihiko Yamasaki
Hefei University of Technology, Hefei, China
Meng Wang
City University of Hong Kong, Hong Kong, Hong Kong
Chong-Wah Ngo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cao, X., Sun, X., Ren, F. (2018). Speech Data Enhancement Based on Hybrid Neural Network. In: Hong, R., Cheng, WH., Yamasaki, T., Wang, M., Ngo, CW. (eds) Advances in Multimedia Information Processing – PCM 2018. PCM 2018. Lecture Notes in Computer Science(), vol 11166. Springer, Cham. https://doi.org/10.1007/978-3-030-00764-5_33

Download citation

DOI: https://doi.org/10.1007/978-3-030-00764-5_33
Published: 18 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00763-8
Online ISBN: 978-3-030-00764-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics