Expressive Speech Synthesis Using Emotion-Specific Speech Inventories

Zainkó, Csaba; Fék, Márk; Németh, Géza

doi:10.1007/978-3-540-70872-8_17

Csaba Zainkó²³,
Márk Fék²³ &
Géza Németh²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5042))

986 Accesses
3 Citations

Abstract

In this paper we explore the use of emotion-specific speech inventories for expressive speech synthesis. We recorded a semantically neutral sentence and 26 logatoms containing all the diphones and CVC triphones necessary to synthesize the same sentence. The speech material was produced by a professional actress expressing all logatoms and the sentence with the six basic emotions and in neutral tone. 7 emotion-dependent inventories were constructed from the logatoms. The 7 inventories paired with the prosody extracted from the 7 natural sentences were used to synthesize 49 sentences. 194 listeners evaluated the emotions expressed in the logatoms and in the natural and synthetic sentences. The intended emotion was recognized above chance level for 99% of the logatoms and for all natural sentences. Recognition rates significantly above chance level were obtained for each emotion. The recognition rate for some synthetic sentences exceeded that of natural ones.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ladd, D.R., Silverman, K., Tolkmitt, F., Bergmann, G., Scherer, K.R.: Evidence for the independent function of intonation contour type, voice quality, and f0 range in signalling speaker affect. Journal of the Acoustic Society of America 78(2), 435–444 (1985)
Article Google Scholar
Inanoglu, Z., Young, S.: A system for Transforming the Emotion in Speech: Combining Data-Driven Conversion Techniques for Prosody and Voice Quality. In: Interspeech (2007)
Google Scholar
Montero, J.M., Arriola, G.J., Colas, J., Enriquez, E., Pardo, J.M.: Analysis and Modeling of Emotional Speech in Spanish. In: Proc. of ICPhS, pp. 957–960 (1999)
Google Scholar
Bulut, M., Narayanan, S.S., Syrdal, A.K.: Expressive Speech Synthesis Using a Concatenative Synthesizer. In: ICSLP-2002, pp. 1265–1268 (2002)
Google Scholar
Schröder, M., Grice, M.: Expressing Vocal Effort in Concatenative Synthesis. In: Proc. of ICPhS, Barcelona, Spain, pp. 2589–2592 (2003)
Google Scholar
Boersma, P.: Praat, a system for doing phonetics by computer. Glot International 5(9/10), 341–345 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Magyar tudósok körútja 2., 1117, Budapest, Hungary
Csaba Zainkó, Márk Fék & Géza Németh

Authors

Csaba Zainkó
View author publications
You can also search for this author in PubMed Google Scholar
Márk Fék
View author publications
You can also search for this author in PubMed Google Scholar
Géza Németh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Psychology, Second University of Naples, and IIASS, Via Pellegrino 19, 84019, Vietri sul Mare (SA), Italy
Anna Esposito
ATRC Center, Wright State University, Dayton, OH, USA
Nikolaos G. Bourbakis
Human Computer Interaction Group, University of Patras, Rio Patras, Greece
Nikolaos Avouris
Department of Computer Engineering, University of Patras, Patras, Greece
Ioannis Hatzilygeroudis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zainkó, C., Fék, M., Németh, G. (2008). Expressive Speech Synthesis Using Emotion-Specific Speech Inventories. In: Esposito, A., Bourbakis, N.G., Avouris, N., Hatzilygeroudis, I. (eds) Verbal and Nonverbal Features of Human-Human and Human-Machine Interaction. Lecture Notes in Computer Science(), vol 5042. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70872-8_17

Download citation

DOI: https://doi.org/10.1007/978-3-540-70872-8_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70871-1
Online ISBN: 978-3-540-70872-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics