HMM-Based Emotional Speech Synthesis Using Average Emotion Model

Qin, Long; Ling, Zhen-Hua; Wu, Yi-Jian; Zhang, Bu-Fan; Wang, Ren-Hua

doi:10.1007/11939993_27

Long Qin²²,
Zhen-Hua Ling²²,
Yi-Jian Wu²²,
Bu-Fan Zhang²² &
…
Ren-Hua Wang²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4274))

Included in the following conference series:

International Symposium on Chinese Spoken Language Processing

1646 Accesses
7 Citations

Abstract

This paper presents a technique for synthesizing emotional speech based on an emotion-independent model which is called “average emotion” model. The average emotion model is trained using a multi-emotion speech database. Applying a MLLR-based model adaptation method, we can transform the average emotion model to present the target emotion which is not included in the training data. A multi-emotion speech database including four emotions, “neutral”, “happiness”, “sadness”, and “anger”, is used in our experiment. The results of subjective tests show that the average emotion model can effectively synthesize neutral speech and can be adapted to the target emotion model using very limited training data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Masuko, T., Tokuda, K., Kobayashi, T., Imai, S.: Speech synthesis from HMMs using dynamic features. In: Proc. ICASSP 1996, pp. 389–392 (1996)
Google Scholar
Leggetter, C.J., Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language 9(2), 171–185 (1995)
Article Google Scholar
Masuko, T., Tokuda, K., Kobayashi, T., Imai, S.: Speaker adaptation for HMM-based speech synthesis system using MLLR. In: The Third ESCA/COCOSDA Workshop on Speech Synthesis, November 1998, pp. 273–276 (1998)
Google Scholar
Yamagishi, J., Onishi, K., Masuko, T., Kobayashi, T.: Acoustic modeling of speaking styles and emotional expressions in HMM-based speech synthesis. IEICE Trans. Information and Systems E88-D(3), 502–509 (2005)
Google Scholar
Yamagishi, J., Tachibana, M., Masuko, T., Kobayashi, T.: Speaking style adaptation using context clustering decision tree for HMM-based speech synthesis. In: Proc. ICASSP 2004, May 2004, vol. 1, pp. 5–8 (2004)
Google Scholar
Kawahara, H.: Restructuring speech representations using a pitch-adaptive time frequency smoothing and a instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sound. Speech Communication 27, 187–207 (1999)
Article Google Scholar
Wu, Y.J., Wang, R.H.: HMM-based trainable speech synthesis for Chinese. To appear in Journal of Chinese Information Processing
Google Scholar
Qin, L., Wu, Y.-J., Ling, Z.-H., Wang, R.-H.: Improving the performance of HMM-base voice conversion using context clustering decision tree and appropriate regression matrix. To appear in Proc. ICSLP 2006 (2006)
Google Scholar
Yamagishi, J., Tamura, M., Masuko, T., Tokuda, K., Kobayashi, T.: A context clustering technique for average voice models. IEICE Trans. Information and Systems E86-D(3), 534–542 (2003)
Google Scholar
Tokuda, K., Masuko, T., Miyazaki, N., Kobayashi, T.: Hidden Markov models based on multi-space probability distribution for pitch pattern modeling. In: Proc. ICASSP 1999, March 1999, pp. 229–232 (1999)
Google Scholar
Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Duration modeling for HMM-based speech synthesis. In: Proc. ICSLP 1998, November 1998, vol. 2, pp. 29–32 (1998)
Google Scholar
Russell, J.A.: A circumplex model of affect. Journal of Personality and Social Psychology 39, 1161–1178 (1980)
Article Google Scholar
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine 18(1), 32–80 (2001)
Article Google Scholar

Download references

Author information

Authors and Affiliations

iFLYTEK Speech Lab, University of Science and Technology of China, Hefei
Long Qin, Zhen-Hua Ling, Yi-Jian Wu, Bu-Fan Zhang & Ren-Hua Wang

Authors

Long Qin
View author publications
You can also search for this author in PubMed Google Scholar
Zhen-Hua Ling
View author publications
You can also search for this author in PubMed Google Scholar
Yi-Jian Wu
View author publications
You can also search for this author in PubMed Google Scholar
Bu-Fan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ren-Hua Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, The University of Hong Kong, Hong Kong
Qiang Huo
Human Language Technology Department, Institute for Infocomm Research (I2R), 119613, Singapore
Bin Ma
School of Computer Engineering, Nanyang Technological University (NTU), 639798, Singapore
Eng-Siong Chng
Institute for Infocomm Research, 21 Heng Mui Keng Terrace, 119613, Singapore
Haizhou Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qin, L., Ling, ZH., Wu, YJ., Zhang, BF., Wang, RH. (2006). HMM-Based Emotional Speech Synthesis Using Average Emotion Model. In: Huo, Q., Ma, B., Chng, ES., Li, H. (eds) Chinese Spoken Language Processing. ISCSLP 2006. Lecture Notes in Computer Science(), vol 4274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11939993_27

Download citation

DOI: https://doi.org/10.1007/11939993_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49665-6
Online ISBN: 978-3-540-49666-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics