Skip to main content

Speech Data Enhancement Based on Hybrid Neural Network

  • Conference paper
  • First Online:
Advances in Multimedia Information Processing – PCM 2018 (PCM 2018)

Abstract

With the rapid development of artificial intelligence, the recognition of speech, text, physiological signals and facial expressions has drawn more and more attention from scholars at home and abroad. Therefore, we cannot just study the problems of one area, but more we look for the similarities across fields. In this paper, the method of image enhancement is adapted according to the speech characteristics, and several feasible methods of speech data enhancement are proposed to avoid the problems of data collection and corpus limitation in speech emotion recognition. Based on the Hybrid neural network (Convolution Neural Network, CNN and Recurrent neural network, RNN) model, the feasibility and performance of the method are verified and showed through several sets of comparative experiments in different methods while a high recognition accuracy is obtained.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kandali, A.B., Routray, A., Basu, T.K.: Emotion recognition from Assamese speeches using MFCC features and GMM classifier. https://doi.org/10.1109/TENCON.2008.4766487

  2. Chao, L., Tao, J., Yang, M., Li, Y.: Improving generation performance of speech emotion recognition by denoising autoencoders. In: International Symposium on Chinese Spoken Language Processing, pp. 341–344 (2014)

    Google Scholar 

  3. Chen, L., Mao, X., Xue, Y., Cheng, L.L.: Speech emotion recognition: features and classification models. Digit. Signal Process. 22(6), 1154–1160 (2012). https://doi.org/10.1016/j.dsp.2012.05.007

    Article  MathSciNet  Google Scholar 

  4. Fei, W., Ye, X., Sun, Z., Huang, Y., Zhang, X., Shang, S.: Research on speech emotion recognition based on deep auto-encoder. In: 2016 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems (CYBER), pp. 308–312. IEEE (2016)

    Google Scholar 

  5. Huang, Z., Dong, M., Mao, Q., Zhan, Y.: Speech emotion recognition using CNN, pp. 801–804 (2014)

    Google Scholar 

  6. Jin, Q., Li, C., Chen, S., Wu, H.: Speech emotion recognition with acoustic and lexical features. https://doi.org/10.1109/ICASSP.2015.7178872

  7. Lim, W., Jang, D., Lee, T.: Speech emotion recognition using convolutional and recurrent neural networks. In: Signal and Information Processing Association Summit and Conference, pp. 1–4 (2017)

    Google Scholar 

  8. Liu, Z.T., Wu, M., Cao, W.H., Mao, J.W., Xu, J.P., Tan, G.Z.: Speech emotion recognition based on feature selection and extreme learning machine decision tree. Neurocomputing 273, 271–280 (2018)

    Article  Google Scholar 

  9. Mao, Q., Dong, M., Huang, Z., Zhan, Y.: Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Trans. Multimed. 16(8), 2203–2213 (2014)

    Article  Google Scholar 

  10. Mariooryad, S., Busso, C.: Compensating for speaker or lexical variabilities in speech for emotion recognition. Speech Commun. 57(1), 1–12 (2014)

    Article  Google Scholar 

  11. Mu, Y., Gómez, L.A.H., Montes, A.C., Martínez, C.A., Wang, X., Gao, H.: Speech emotion recognition using convolutional-recurrent neural networks with attention model. DEStech Transactions on Computer Science and Engineering (CII) (2017)

    Google Scholar 

  12. Trigeorgis, G., et al.: Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In: IEEE International Conference on Acoustics, Speech and Signal Processing (2016)

    Google Scholar 

  13. Zhou, Y., Sun, Y., Zhang, J., Yan, Y.: Speech emotion recognition using both spectral and prosodic features. In: International Conference on Information Engineering and Computer Science, pp. 1–4 (2009)

    Google Scholar 

  14. Zhou, Y., Sun, Y., Zhang, J., Yan, Y.: Speech emotion recognition using both spectral and prosodic features. https://doi.org/10.1109/ICIECS.2009.5362730

Download references

Acknowledgment

The work is supported by the State Key Program of National Natural Science of China (61432004, 71571058, 61461045). This work was partially supported by the China Postdoctoral Science Foundation funded project (2017T100447). This research has been partially supported by National Natural Science Foundation of China under Grant No. 61472117. This work is also supported by the foundational application research of Qinghai Province Science and Technology Fund (No. 2016-ZJ-743).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiao Sun .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cao, X., Sun, X., Ren, F. (2018). Speech Data Enhancement Based on Hybrid Neural Network. In: Hong, R., Cheng, WH., Yamasaki, T., Wang, M., Ngo, CW. (eds) Advances in Multimedia Information Processing – PCM 2018. PCM 2018. Lecture Notes in Computer Science(), vol 11166. Springer, Cham. https://doi.org/10.1007/978-3-030-00764-5_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00764-5_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00763-8

  • Online ISBN: 978-3-030-00764-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics