Skip to main content
Log in

Annotation in the SpeechDat Projects

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

A large set of spoken language resources (SLR) for various European languages is being compiled in several SpeechDat projects with the aim to train and test speech recognizers for voice driven services, mainly over telephone lines. This paper is focused on the annotation conventions applied for the Speechdat SLR. These SLR contain typical examples of short monologue speech utterances with simple orthographic transcriptions in a hierarchically simple annotation structure. The annotation conventions and their underlying principles are described and compared to approaches used for related SLR. The synchronization of the orthographic transcriptions with the corresponding speech files is addressed, and the impact of the selected approach for capturing specific phonological and phonetic phenomena is discussed. In the SpeechDat projects a number of tools have been developed to carry out the transcription of the speech. In this paper, a short description of these tools and their properties is provided. For all SpeechDat projects, an internal validity check of the databases and their annotations is carried out. The procedure of this validation campaign, the performed evaluations, and some of the results are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Baum, M., Erbach, G., and Kubin, G. (2000). SpeechDat-AT: A telephone speech database for Austrian German. In Proc.LREC'2000 Satellite Workshop XLDB—Very large Telephone Speech Databases, 29 May 2000, Athens, Greece, pp. 51-56.

  • Bernstein, J., Taussig, K., and Godfrey, J. (1994). Macrophone: An American English telephone speech corpus for the Polyphone project. Proc. ICASSP-94, Adelaide, pp. 81-83.

  • Bird, S. and Liberman, M. (1999). A formal framework for linguistic annotation (Technical Report MS-CIS-99-01). Department of Computer and Information Science, University of Pennsylvania.

  • Bonafonte, A., Moreno, A., Draxler, C., Van den Heuvel, H., and Yli-Hietanen, J. (1998). Annotation tools (SpeechDat Car Technical Report SD3.1.2).

  • Brugnara, F., Falavigna, D., and Omologo, M. (1993). Automatic segmentation and labeling of speech based on Hidden Markov models. Speech Communication, 12:357-370.

    Google Scholar 

  • Contantinescu, A., Caloz, G., Draxler, C., Van den Heuvel, H., Sanders, E., Winsky, R., Nataf, A., Chatzi, I., Senia, F., Moreno, A., and Johansen, F. (1997). Report on developed tools (SpeechDat Technical Report SD3.1.2).

  • Cremelie, N. and Martens, J.P. (1998). In search of pronunciation rules. In “Modeling Pronunciation Variation for Automatic Speech Recognition” Rolduc, pp. 23-27.

  • Cristoforetti, L., Matassoni, M., Omologo, M., Svaizer, P., and Zovato, E. (2000). Annotation of a multichannel noisy speech corpus. In Proc. of the Second International Conference on Language Resources and Evaluation, Athens, pp. 1547-1550.

  • Den Os, E.A. den, Boogaart, T.I., Boves, L., and Klabbers, E. (1995). The Dutch Polyphone corpus. In Proc. Eurospeech-95, Madrid, Spain, pp. 825-828.

  • Draxler, C. (1998). WWWSigTranscribe. A JAVA extension of the WWWTranscribe toolbox. In Proc. of the First International Conference on Language Resources and Evaluation. Granada, Spain, pp. 1313-1316.

  • Draxler, C. (1999). Specification of database interchange format (SpeechDat-Car Technical Report D1.3.3).

  • Draxler, C. (2000). Speech databases. In F. Van Eynde and D. Gibbon (Eds.), Lexicon development for Speech and Language Processing. Dordrecht, Boston, London: Kluwer Academic Publishers, pp. 169-204.

    Google Scholar 

  • Draxler, C., Van den Heuvel, H., and Tropf, H.S. (1998). Speech-Dat experiences in creating large multilingual speech databases Annotation in the SpeechDat Projects 143 for teleservices. In Proc. of the First International Conference on Language Resources and Evaluation, Granada, pp. 361-366.

  • Fonollosa, J.A.R. and Moreno, A. (1998). Automatic database acquisition software for ISDN PC cards and analogue boards. In Proc. of the First International Conference on Language Resources and Evaluation, Granada, pp. 1325-1328.

  • Gibbon, D., Moore, R., and Winski, R. (Eds.) (1997). Handbook of Standards and Resources for Spoken Language Systems. Berlin, New York: Mouton, de Gruyter.

    Google Scholar 

  • Höge, H., Draxler, C., Heuvel, H. van den, Johansen, F.T., Sanders, E., and Tropf, H.S. (1999). Speechdat multilingual speech databases for teleservices: Across the finish line. In Proc. EUROSPEECH'99, Budapest, Hungary, 5-9 Sept. 1999, pp. 2699-2702.

    Google Scholar 

  • Kessens, J.M., Strik, H., and Cucchiarini, C. (2000). A bottom-upmethod for obtaining information about pronunciation variation. In Proc. of ICSLP 2000, Beijing, China, pp. 274-277.

  • Lamel, L., Kassel, R.H., and Seneff, S. (1986). Speech database development: Design and analysis of the acoustic-phonetic corpus. Proc. DARPA Speech Recognition Workshop, pp. 100-109.

  • Lindberg, B., Comeyne, R., Draxler, C., and Senia, F. (1998). Speaker recruitment methods and speaker coverage. Experiences from a large multilingual speech database collection. In Proc. ICSLP 98, Sydney, pp. 2731-2734.

  • Mengel, A. and Heid, U. (1999). Enhancing reusability of speech corpora by hyperlinked query output. In Proc. Eurospeech 99, Budapest, pp. 2703-2706.

  • Moreno, A., Höge, H., Koehler, J., and Marino, J. (1998). Speech-Dat across Latin America. Project SALA. In Proc. of the First International Conference on Language Resources and Evaluation, Granada, pp. 367-370.

  • Nogueiras, A. and Moreno, A. (1998). NaniBD: A set of tools for transcribing and validating speech databases. In Proc. of the First International Conference on Language Resources and Evaluation, Granada, pp. 1359-1365.

  • Omologo, M. and Svaizer, P. (1997). Use of the cross-power spectrum phase in acoustic event location. IEEE Trans. on SAP, 5(3):288-292.

    Google Scholar 

  • Sala, M., Sanchez, F., Wengelnik, H., Van den Heuvel, H., Moreno, A., Le Chevalier, E., Deregibus, E., and Richard, G. (1999). Speechdat-Car: Speech databases for voice driven teleservices and control of in-car applications. In Proc. EAEC 99, Barcelona, pp. 90-98.

  • SAM (1992). User guide to ETR tools. SAM: Multi-lingual speech Input/Output Assessment, Methodology and Standardisation. Ref: SAM-UCL-G007.

  • Senia, F. (1997). Specification of speech database interchange format (SpeechDat Technical Report SD1.3.1).

  • Senia, F. and Van Velden, J. (1997). Specification of orthographic transcription and lexicon conventions (SpeechDat Technical Report SD1.3.3).

  • Shriberg, L., Price, P., Garofolo, J., and Fisher, W. (1993). ATIS. SR output (“.sro”) transcription conventions. http://www.ldc.upenn. edu/Catalog/readme files/atis3/sro spec.html.

  • Strik, H. and Cucchiarini, C. (1999). Modeling pronunciation variationfor ASR: A survey of the literature. Speech Communication, 29:225-246.

    Google Scholar 

  • Taussig, K. (1997). Macrophone transcription. http://www.ldc. upenn.edu/Catalog/readme files/macrophone/transcrp.html.

  • Van den Heuvel, H. (1997).Validation criteria (SpeechDat Technical Report SD1.3.3).

  • Van den Heuvel, H. (1999). Validation criteria (SpeechDat Car Technical Report D1.3.1).

  • Van den Heuvel, H. (2000a). SLR validation: Evaluation of the SpeechDat approach. In Proc. LREC'2000 Satellite workshop XLDB—Very large Telephone Speech Databases, 29 May 2000, Athens, Greece, pp. 40-45.

  • Van den Heuvel, H. (2000b). The art of validation. ELRA Newsletter, 5(4):4-6.

    Google Scholar 

  • Van den Heuvel, H., Bonafonte, A., Boudy, J., Dufour, S., Lockwood, Ph., Moreno, A., and Richard, G. (1999). SpeechDat-Car: Towards a collection of speech databases for automotive environments. In Proc. of the Workshop for Robust Methods for Speech Recognition in Adverse Conditions, Tampere, pp. 135-138.

  • Van den Heuvel, H., Boudy, J., Comeyne, R., Euler, S., Moreno, A., and Richard, G. (1999). The SpeechDat-Car multiligual speech databases for in-car applications: Some first validation results. In Proc. Eurospeech 99, Budapest, pp. 2279-2282.

  • Wells, J. (1997). Standards, Assessment, and methods: Phonetic Alphabets. London: University College.

    Google Scholar 

Web References

  • Speechdat Family: http://www.speechdat.org/

  • SpeechDat: http://www.speechdat.org/SpeechDat/

  • SpeechDat Car: http://www.speechdat.org/SP-CAR

  • SpeechDat East: http://www.fee.vutbr.cz/SPEECHDAT-E/

  • SALA: http://gps-tsc.upc.es/veu/sala/

  • ELRA: http://www.icp.inpg.fr/ELRA/home.htm

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

van den Heuvel, H., Boves, L., Moreno, A. et al. Annotation in the SpeechDat Projects. International Journal of Speech Technology 4, 127–143 (2001). https://doi.org/10.1023/A:1011375311203

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1011375311203

Navigation