Learnable Phonetic Representations in a Connectionist TTS System — I: Text to Phonetics

Cohen, Andrew D.

doi:10.1007/978-1-4757-3413-3_8

Andrew D. Cohen

Part of the book series: Telecommunications Technology & Applications Series ((TTAP))

121 Accesses

Abstract

Results from connectionist experiments in text-to-speech conversion suggest that non-symbolic intermediate (‘phonetic’) representations may have a useful part to play in the design of a synthesis system. A similar strategy suggests itself in the subsequent stage when speech is produced from the intermediate representation, which makes it possible to bypass a symbolic, phonemic stage in the overall system, once trained. (This second stage is dealt with in a later chapter.) Error can still be calculated in terms of phonemes correct, but this is not necessarily a good measure of the naturalness and acceptability of the output speech. In contrast to other trainable text-to-speech systems, emphasis is laid here on the fundamental importance of phonetic and phonological sources of variability, and their separation from the underlying physical and temporal events. As far as possible, this phonetic/phonological capability should be built into the system prior to training on the main task at hand, as this corresponds more closely to the way these skills are acquired in human beings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Authors

Andrew D. Cohen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Image, Speech and Intelligent Systems (ISIS) Research Group, Department of Electronics and Computer Science, University of Southampton, SO17 1BJ, Southampton, UK
Robert I. Damper

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cohen, A.D. (2001). Learnable Phonetic Representations in a Connectionist TTS System — I: Text to Phonetics. In: Damper, R.I. (eds) Data-Driven Techniques in Speech Synthesis. Telecommunications Technology & Applications Series. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-3413-3_8

Download citation

DOI: https://doi.org/10.1007/978-1-4757-3413-3_8
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-4733-8
Online ISBN: 978-1-4757-3413-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics