Abstract
In this work, the task is to assist human transcribers to produce, for example, interview or parliament speech transcriptions. The system will perform in-document adaptation based on a small amount of manually corrected automatic speech recognition results. The corrected segments of the spoken document are used to adapt the speech recognizer’s acoustic and language model. The updated models are used in second-pass recognition to produce a more accurate automatic transcription for the remaining uncorrected parts of the spoken document. In this work we evaluate two common adaptation methods for speech data in settings that represent typical transcription tasks. For adapting the acoustic model we use the Maximum A Posteriori adaptation method. For adapting the language model we use linear interpolation. We compare results of supervised adaptation to unsupervised adaptation, and evaluate the total benefit of using human corrected segments for in-document adaptation for typical transcription tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Department of general linguistics, university of helsinki, linguistics and language technology department, university of joensuu, research institute for the languages of finland, and csc “finnish text collection - collection of finnish text documents from years 1990–2000.” http://www.csc.fi/kielipankki/
Creutz, M., Lagus, K.: Unsupervised morpheme segmentation and morphology induction from text corpora using Morfessor 1.0. Helsinki University of Technology (2005)
Gaur, Y.: The effects of automatic speech recognition quality on human transcription latency. In: Proceedings of the 17th International ACM SIGACCESS Conference on Computers and Accessibility, pp. 367–368. ACM (2015)
Hirsimaki, T., Pylkkonen, J., Kurimo, M.: Importance of high-order n-gram models in morph-based speech recognition. IEEE Trans. Audio, Speech, Lang. Process. 17(4), 724–732 (2009)
Iskra, D.J., Grosskopf, B., Marasek, K., van den Heuvel, H., Diehl, F., Kiessling, A.: Speecon-speech databases for consumer devices: database specification and validation. In: LREC (2002)
Leino, K., et al.: Maximum a posteriori for acoustic model adaptation in automatic speech recognition (2015)
Mansikkaniemi, A., Kurimo, M.: Unsupervised and user feedback based lexicon adaptation for foreign names and acronyms. In: Dediu, A.-H., Martín-Vide, C., Vicsi, K. (eds.) SLSP 2015. LNCS, pp. 197–206. Springer, Heidelberg (2015)
Ogata, J., Goto, M.: Podcastle: collaborative training of acoustic models on the basis of wisdom of crowds for podcast transcription. In: INTERSPEECH, pp. 1491–1494 (2009)
Ogata, J., Goto, M.: Podcastle: collaborative training of language models on the basis of wisdom of crowds. In: INTERSPEECH, pp. 2370–2373 (2012)
Siivola, V., Hirsimaki, T., Virpioja, S.: On growing and pruning kneser-ney smoothed-gram models. IEEE Trans. Audio, Speech, Lang. Process. 15(5), 1617–1624 (2007)
Vergyri, D., Stolcke, A., Tur, G.: Exploiting user feedback for language model adaptation in meeting recognition. In: 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4737–4740. IEEE (2009)
Yu, D., Hwang, M.Y., Mau, P., Acero, A., Deng, L.: Unsupervised learning from users’ error correction in speech dictation. In: INTERSPEECH (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Mansikkaniemi, A., Kurimo, M., Lindén, K. (2016). In-Document Adaptation for a Human Guided Automatic Transcription Service. In: Ronzhin, A., Potapova, R., Németh, G. (eds) Speech and Computer. SPECOM 2016. Lecture Notes in Computer Science(), vol 9811. Springer, Cham. https://doi.org/10.1007/978-3-319-43958-7_47
Download citation
DOI: https://doi.org/10.1007/978-3-319-43958-7_47
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43957-0
Online ISBN: 978-3-319-43958-7
eBook Packages: Computer ScienceComputer Science (R0)