Skip to main content

HMM-Based Emotional Speech Synthesis Using Average Emotion Model

  • Conference paper
Chinese Spoken Language Processing (ISCSLP 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4274))

Included in the following conference series:

Abstract

This paper presents a technique for synthesizing emotional speech based on an emotion-independent model which is called “average emotion” model. The average emotion model is trained using a multi-emotion speech database. Applying a MLLR-based model adaptation method, we can transform the average emotion model to present the target emotion which is not included in the training data. A multi-emotion speech database including four emotions, “neutral”, “happiness”, “sadness”, and “anger”, is used in our experiment. The results of subjective tests show that the average emotion model can effectively synthesize neutral speech and can be adapted to the target emotion model using very limited training data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Masuko, T., Tokuda, K., Kobayashi, T., Imai, S.: Speech synthesis from HMMs using dynamic features. In: Proc. ICASSP 1996, pp. 389–392 (1996)

    Google Scholar 

  2. Leggetter, C.J., Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language 9(2), 171–185 (1995)

    Article  Google Scholar 

  3. Masuko, T., Tokuda, K., Kobayashi, T., Imai, S.: Speaker adaptation for HMM-based speech synthesis system using MLLR. In: The Third ESCA/COCOSDA Workshop on Speech Synthesis, November 1998, pp. 273–276 (1998)

    Google Scholar 

  4. Yamagishi, J., Onishi, K., Masuko, T., Kobayashi, T.: Acoustic modeling of speaking styles and emotional expressions in HMM-based speech synthesis. IEICE Trans. Information and Systems E88-D(3), 502–509 (2005)

    Google Scholar 

  5. Yamagishi, J., Tachibana, M., Masuko, T., Kobayashi, T.: Speaking style adaptation using context clustering decision tree for HMM-based speech synthesis. In: Proc. ICASSP 2004, May 2004, vol. 1, pp. 5–8 (2004)

    Google Scholar 

  6. Kawahara, H.: Restructuring speech representations using a pitch-adaptive time frequency smoothing and a instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sound. Speech Communication 27, 187–207 (1999)

    Article  Google Scholar 

  7. Wu, Y.J., Wang, R.H.: HMM-based trainable speech synthesis for Chinese. To appear in Journal of Chinese Information Processing

    Google Scholar 

  8. Qin, L., Wu, Y.-J., Ling, Z.-H., Wang, R.-H.: Improving the performance of HMM-base voice conversion using context clustering decision tree and appropriate regression matrix. To appear in Proc. ICSLP 2006 (2006)

    Google Scholar 

  9. Yamagishi, J., Tamura, M., Masuko, T., Tokuda, K., Kobayashi, T.: A context clustering technique for average voice models. IEICE Trans. Information and Systems E86-D(3), 534–542 (2003)

    Google Scholar 

  10. Tokuda, K., Masuko, T., Miyazaki, N., Kobayashi, T.: Hidden Markov models based on multi-space probability distribution for pitch pattern modeling. In: Proc. ICASSP 1999, March 1999, pp. 229–232 (1999)

    Google Scholar 

  11. Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Duration modeling for HMM-based speech synthesis. In: Proc. ICSLP 1998, November 1998, vol. 2, pp. 29–32 (1998)

    Google Scholar 

  12. Russell, J.A.: A circumplex model of affect. Journal of Personality and Social Psychology 39, 1161–1178 (1980)

    Article  Google Scholar 

  13. Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine 18(1), 32–80 (2001)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Qin, L., Ling, ZH., Wu, YJ., Zhang, BF., Wang, RH. (2006). HMM-Based Emotional Speech Synthesis Using Average Emotion Model. In: Huo, Q., Ma, B., Chng, ES., Li, H. (eds) Chinese Spoken Language Processing. ISCSLP 2006. Lecture Notes in Computer Science(), vol 4274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11939993_27

Download citation

  • DOI: https://doi.org/10.1007/11939993_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49665-6

  • Online ISBN: 978-3-540-49666-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics