Skip to main content

A Simple Yet Effective Approach for Named Entity Recognition from Transcribed Broadcast News

  • Conference paper
Evaluation of Natural Language and Speech Tools for Italian (EVALITA 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7689))

Abstract

Automatic speech transcriptions pose serious challenges for NLP systems due to various peculiarities in the data. In this paper, we propose a simple approach for NER on speech transcriptions which achieves good results despite the peculiarities. The novelty of our approach is that it emphasizes on the maximum exploitation of the tokens, as they are, in the data. We developed a system for participating in the “NER on Transcribed Broadcast News” (closed) task of the EVALITA 2011 evaluation campaign where it was one of the best systems obtaining an F1-score of 57.02 on the automatic speech transcription test data. On the manual transcriptions of the same test data (although having no sentence boundary and punctuation symbol), the system achieves an F1-score of 73.54 which is quite high considering the fact that the system is language independent and uses no external dictionaries, gazetteers or ontologies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Schmid, H.: Probabilistic Part-of-Speech Tagging Using Decision Trees. In: International Conference on New Methods in Language Processing, Manchester, UK, pp. 44–49 (1994)

    Google Scholar 

  2. Kubala, F., Schwartz, R., Stone, R., Weischedel, R.: Named Entity Extraction From Speech. In: Proceedings of DARPA Broadcast News Transcription and Understanding Workshop, Virginia, USA, pp. 287–292 (1998)

    Google Scholar 

  3. Appelt, D.E., Martin, D.: Named Entity Recognition in Speech: Approach and Results Using the TextPro System. In: Proceedings DARPA Broadcast News Workshop, Virginia, USA, pp. 51–54 (1999)

    Google Scholar 

  4. McCallum, A.K.: Mallet: A Machine Learning for Language Toolkit (2002), http://mallet.cs.umass.edu

  5. Horlock, J., King, S.: Named Entity Extraction from Word Lattices. In: Eurospeech (2003)

    Google Scholar 

  6. Huang, F.: Multilingual Named Entity Extraction and Translation from Text and Speech., Ph.D. thesis, Carnegie Mellon University (2005)

    Google Scholar 

  7. Sudoh, K., Tsukada, H., Isozaki, H.: Incorporating Speech Recognition Confidence into Discriminative Named Entity Recognition of Speech Data. In: Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics and the 21st International Conference on Computational Linguistics (ACL-COLING ), Sydney, Australia, pp. 617–624 (2006)

    Google Scholar 

  8. Batista, F., Caseiro, D., Mamede, N.J., Trancoso, I.: Recovering Capitalization and Punctuation Marks for Automatic Speech Recognition: Case Study for Portuguese Broadcast News. In: Speech Communication, vol. 50(10), pp. 847–862. Elsevier (2008)

    Google Scholar 

  9. Galliano, S., Gravier, G., Chaubard, L.: The ESTER 2 Evaluation Campaign for the Rich Transcription of French Radio Broadcasts. In: Proceedings of the 10th Annual International Speech Communication Association Conference (Interspeech), Brighton, UK, pp. 2583–2586 (2009)

    Google Scholar 

  10. Gravano, A., Jansche, M., Bacchiani, M.: Restoring Punctuation and Capitalization in Transcribed Speech. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Taiwan, pp. 4741–4744 (2009)

    Google Scholar 

  11. Chowdhury, M.F.M., Negri, M.: Expected Answer Type Identification from Unprocessed Noisy Questions. In: Andreasen, T., Yager, R., Bulskov, H., Christiansen, H., Larsen, H. (eds.) FQAS 2009. LNCS, vol. 5822, pp. 263–274. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  12. Magnini, B., Pianta, E., Speranza, M., Lenzi, V.B., Sprugnoli, R.: Italian Content Annotation Bank (i-cab): Named entities. Technical report, FBK (2011)

    Google Scholar 

  13. Parada, C., Dredze, M., Jelinek, F.: OOV Sensitive Named Entity Recognition in Speech. In: Proceedings of the 12th Annual International Speech Communication Association Conference (Interspeech), Florence, Italy, pp. 2085–2088 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chowdhury, M.F.M. (2013). A Simple Yet Effective Approach for Named Entity Recognition from Transcribed Broadcast News. In: Magnini, B., Cutugno, F., Falcone, M., Pianta, E. (eds) Evaluation of Natural Language and Speech Tools for Italian. EVALITA 2012. Lecture Notes in Computer Science(), vol 7689. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35828-9_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35828-9_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35827-2

  • Online ISBN: 978-3-642-35828-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics