Improving Transcript-Based Video Retrieval Using Unsupervised Language Model Adaptation

Wilhelm-Stein, Thomas; Herms, Robert; Ritter, Marc; Eibl, Maximilian

doi:10.1007/978-3-319-11382-1_11

Thomas Wilhelm-Stein²²,
Robert Herms²²,
Marc Ritter²² &
…
Maximilian Eibl²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8685))

Included in the following conference series:

International Conference of the Cross-Language Evaluation Forum for European Languages

1047 Accesses
2 Citations
3 Altmetric

Abstract

One challenge in automated speech recognition is to determine domain-specific vocabulary like names, brands, technical terms etc. by using generic language models. Especially in broadcast news new names occur frequently. We present an unsupervised method for a language model adaptation, which is used in automated speech recognition with a two-pass decoding strategy to improve spoken document retrieval on broadcast news. After keywords are extracted from each utterance, a web resource is queried to collect utterance-specific adaptation data. This data is used to augment the phonetic dictionary and adapt the basic language model. We evaluated this strategy on a data set of summarized German broadcast news using a basic retrieval setup.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Garofolo, J.S., Auzanne, C.G.P., Voorhees, E.M.: The trec spoken document retrieval track: A success story. In: Mariani, J.J., Harman, D. (eds.) RIAO, CID, pp. 1–20 (2000)
Google Scholar
Chen, L., Lamel, L., Gauvain, J.L., Adda, G.: Dynamic language modeling for broadcast news. In: 8th International Conference on Spoken Language Processing (INTERSPEECH), pp. 997–1000 (2004)
Google Scholar
Meng, S., Thambiratnam, K., Lin, Y., Wang, L., Li, G., Seide, F.: Vocabulary and language model adaptation using just one speech file. In: The IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 5410–5413 (2010)
Google Scholar
Lecorvé, G., Gravier, G., Sébillot, P.: An unsupervised web-based topic language model adaptation method. In: The IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 5081–5084 (2008)
Google Scholar
Tsiartas, A., Georgiou, P.G., Narayanan, S.: Language model adaptation using www documents obtained by utterance-based queries. In: The IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 5406–5409 (2010)
Google Scholar
Schlippe, T., Gren, L., Vu, N.T., Schultz, T.: Unsupervised language model adaptation for automatic speech recognition of broadcast news using web 2.0. In: The 14th Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 2698–2702 (2013)
Google Scholar
Iyer, R., Ostendorf, M.: Relevance weighting for combining multi-domain data for n-gram language modeling. Computer Speech and Language 13(3), 267–282 (1999)
Article Google Scholar
Saykham, K., Chotimongkol, A., Wutiwiwatchai, C.: Online temporal language model adaptation for a thai broadcast news transcription system. In: Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) LREC. European Language Resources Association (2010)
Google Scholar
Klakow, D., Peters, J.: Testing the correlation of word error rate and perplexity. Speech Communication 38(1-2), 19–28 (2002)
Article MATH Google Scholar
Kürsten, J., Wilhelm, T.: Extensible retrieval and evaluation framework: Xtrieval. In: Baumeister, J., Atzmüller, M. (eds.) LWA. Volume 448 of Technical Report, Department of Computer Science, University of Würzburg, Germany, 107–110 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Technische Universität Chemnitz, 09107, Chemnitz, Germany
Thomas Wilhelm-Stein, Robert Herms, Marc Ritter & Maximilian Eibl

Authors

Thomas Wilhelm-Stein
View author publications
You can also search for this author in PubMed Google Scholar
Robert Herms
View author publications
You can also search for this author in PubMed Google Scholar
Marc Ritter
View author publications
You can also search for this author in PubMed Google Scholar
Maximilian Eibl
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Google Inc., Brandschenkestraße 110, 8002, Zurich, Switzerland
Evangelos Kanoulas
Institute of Software Technology and Interactive Systems, Vienna University of Technology, Favoritenstrasse 9-11, 1040, Vienna, Austria
Mihai Lupu
Information School, University of Sheffield, Sheffield, UK
Paul Clough
Department of Computer Science and IT, RMIT University, 3000, Melbourne, VIC, Australia
Mark Sanderson
Department of Computing, Edge Hill University, L39 4QP, Ormskirk, Lancashire, UK
Mark Hall
Vienna University of Technology, Austria
Allan Hanbury
Information School, University of Sheffield, Regent Court, 211 Portobello, S1 4DP, Sheffield, UK
Elaine Toms

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wilhelm-Stein, T., Herms, R., Ritter, M., Eibl, M. (2014). Improving Transcript-Based Video Retrieval Using Unsupervised Language Model Adaptation. In: Kanoulas, E., et al. Information Access Evaluation. Multilinguality, Multimodality, and Interaction. CLEF 2014. Lecture Notes in Computer Science, vol 8685. Springer, Cham. https://doi.org/10.1007/978-3-319-11382-1_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-11382-1_11
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11381-4
Online ISBN: 978-3-319-11382-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics