Audio Partitioning and Transcription for Broadcast Data Indexation

Gauvain, J.L.; Lamel, L.; Adda, G.

doi:10.1023/A:1011303401042

Audio Partitioning and Transcription for Broadcast Data Indexation

Published: June 2001

Volume 14, pages 187–200, (2001)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

J.L. Gauvain¹,
L. Lamel¹ &
G. Adda¹

128 Accesses
13 Citations
Explore all metrics

Abstract

This work addresses automatic transcription of television and radio broadcasts in multiple languages. Transcription of such types of data is a major step in developing automatic tools for indexation and retrieval of the vast amounts of information generated on a daily basis. Radio and television broadcasts consist of a continuous data stream made up of segments of different linguistic and acoustic natures, which poses challenges for transcription. Prior to word recognition, the data is partitioned into homogeneous acoustic segments. Non-speech segments are identified and removed, and the speech segments are clustered and labeled according to bandwidth and gender. Word recognition is carried out with a speaker-independent large vocabulary, continuous speech recognizer which makes use of n-gram statistics for language modeling and of continuous density HMMs with Gaussian mixtures for acoustic modeling. This system has consistently obtained top-level performance in DARPA evaluations. Over 500 hours of unpartitioned unrestricted American English broadcast data have been partitioned, transcribed and indexed, with an average word error of about 20%. With current IR technology there is essentially no degradation in information retrieval performance for automatic and manual transcriptions on this data set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

S.S. Chen and P.S. Gopalakrishnan, “Environment and channel change detection and clustering via the Bayesian information criterion,” in Proc. DARPA Broadcast News Transcription and Understanding Workshop, Landsdowne, Virginia, Feb. 1998, pp. 127–132.
J.S. Garofolo, E.M. Voorhees, C.G.P. Auzanne, V.M. Stanford, and B.A. Lund, “Design and preparation of the 1996 Hub-4 broadcast news benchmark test corpora,” in Proc. of the DARPA Speech RecognitionWorkshop, Chantilly, Virginia, Feb. 1997, pp. 15–21. (see also http://www.nist.gov/speech/tests/).
J.S. Garofolo, C.G.P. Auzanne, E.M. Voorhees, and B. Fisher, “The TREC spoken document retrieval track: a success story,” in Proc. 8th Text Retrieval Conference TREC-8, Gaithersburg, Maryland, Nov. 1998, pp. 107–130.
J.L. Gauvain and C.H. Lee, “Maximum a posteriori estimation for multivariate gaussain mixture observation of markov chains, IEEE Trans. on SAP, Vol. 2, No. 2, pp. 291–298, April 1994.
Google Scholar
J.L. Gauvain, L. Lamel, G. Adda, and M. Adda-Decker, “The LIMSI Nov93 WSJ system,” in Proc. ARPA Spoken Language Technologies Workshop, Plainsboro, New Jersey, March 1994, pp. 125–128.
J.L. Gauvain, G. Adda, L. Lamel, and M. Adda-Decker, “Transcribing broadcast news: the LIMSI Nov96 Hub4 system,” in Proc. ARPA Speech Recognition Workshop, Chantilly, Virginia, Feb. 1997, pp. 56–63.
J.L. Gauvain, Y. de Kercadio, L. Lamel, and G. Adda, “The LIMSI SDR system for TREC-8,” in Proc. 8th Text Retrieval Conference TREC-8, Gaithersburg, Maryland, Nov. 1999, pp. 475–482.
J.L. Gauvain, L. Lamel, G. Adda, and M. Jardino, “The LIMSI 1998 Hub-4E transcription system,” in Proc. DARPA Broadcast News Workshop, Herndon, Virginia, Feb. 1999, pp. 99–104.
T. Hain, S.E. Johnson, A. Tuerk, P.C. Woodland, and S.J. Young. “Segment generation and clustering in the HTK broadcast news transcription system,” in DARPA Broadcast News Transcription and Understanding Workshop, Landsdowne, Virginia, Feb. 1998, pp. 133–137.
D. Hiemstra and K. Wessel, “Twenty-one at TREC-7: ad-hoc and cross-language track,” in Proc. 7th Text Retrieval Conference TREC-7, 227–238, Gaithersburg, Maryland, Nov. 1999.
K.S. Jones, S. Walker, and S.E. Robertson, “A probabilistic model of information retrieval: development and status,” A technical report of the computer laboratory, University of Cambridge, U.K., 1998.
Google Scholar
F.M.G. de Jong, J.L. Gauvain, J. den Hartog, and K. Netter, “Olive: speech based video retrieval,” in Proc. CBMI'99, Toulouse, France, Oct. 1999.
C.J. Leggetter and P.C. Woodland,“Maximumlikelihood linear regression for speaker adaptation of continuous density hidden Markov models,” Computer Speech and Language, Vol. 9, No. 2, pp. 171–185, 1995.
Google Scholar
D.R.H. Miller, T. Leek, and R.M. Schwartz, “BBN at TREC7: using hidden markov models for information retrieval,” in Proc. 7th Text Retrieval Conference TREC-7, Gaithersburg, Maryland, Nov. 1999, pp. 133–142.
M.F. Porter, “An Algorithm for Suffix, Stripping,” Program Vol. 14, No. 3, pp. 130–137, 1980.
Google Scholar
PSMedia. http://www.thomson.com/psmedia/bnews.html
M. Siegler, U. Jain, B. Raj, and R. Stern, “Automatic segmentation, classification and clustering of broadcast news audio,” in Proc. DARPA Speech Recognition Workshop, Chantilly, Virginia, Feb. 1997, pp. 97–99.
UMass. ftp://ciir-ftp.cs.umass.edu/pub/stemming/
S. Walker and R. de Vere, “Improving subject retrieval in online catalogues: 2. Relevance feedback and query expansion,” British Library Research Paper 72, British Library, London, U.K., 1990.
Google Scholar

Download references

Author information

Authors and Affiliations

Spoken Language Processing Group, LIMSI-CNRS, BP 133, 91403, Orsay, France
J.L. Gauvain, L. Lamel & G. Adda

Authors

J.L. Gauvain
View author publications
You can also search for this author in PubMed Google Scholar
L. Lamel
View author publications
You can also search for this author in PubMed Google Scholar
G. Adda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to J.L. Gauvain.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gauvain, J., Lamel, L. & Adda, G. Audio Partitioning and Transcription for Broadcast Data Indexation. Multimedia Tools and Applications 14, 187–200 (2001). https://doi.org/10.1023/A:1011303401042

Download citation

Issue Date: June 2001
DOI: https://doi.org/10.1023/A:1011303401042

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Audio Partitioning and Transcription for Broadcast Data Indexation

Abstract

Access this article

Similar content being viewed by others

Automatic Transcription of Polish Radio and Television Broadcast Audio

Spoken Document Retrieval: Sub-sequence DTW Framework and Variants

N-Best 2008: A Benchmark Evaluation for Large Vocabulary Speech Recognition in Dutch

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Navigation

Audio Partitioning and Transcription for Broadcast Data Indexation

Abstract

Access this article

Similar content being viewed by others

Automatic Transcription of Polish Radio and Television Broadcast Audio

Spoken Document Retrieval: Sub-sequence DTW Framework and Variants

N-Best 2008: A Benchmark Evaluation for Large Vocabulary Speech Recognition in Dutch

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation