Skip to main content

A Language Independent Approach for Named Entity Recognition in Subject Headings

  • Conference paper
Research and Advanced Technology for Digital Libraries (TPDL 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6966))

Included in the following conference series:

Abstract

Subject headings systems are tools for organization of knowledge that have been developed over the years by libraries. The SKOS Simple Knowledge Organization System has provided a practical way to represent subject headings systems using the Resource Description Framework, and several libraries have taken the initiative to make subject headings systems widely available as open linked data. Each individual subject heading describes a concept, however, in the majority of cases, one subject heading is actually a combination of several concepts, such as a topic bounded in geographical and temporal scopes. In these cases, the label of the concept actually carries several concepts which are not represented in structured form. Our work explores machine learning techniques to recognize the sub concepts represented in the labels of SKOS subject headings. This paper describes a language independent named entity recognition technique based on conditional random fields, a machine learning algorithm for sequence labelling. This technique was evaluated on a subset of the Library of Congress Subject Headings, where we measured the recognition of geographic concepts, topics, time periods and historical periods. Our technique achieved an overall F1 score of 0.98.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Hoerman, H.L., Furniss, K.A.: Turning Practice into Principles: A Comparison of the IFLA Principles Underlying Subject Heading Languages (SHLs) and the Principles Underlying the Library of Congress Subject Headings System. Cataloging & Classification Quarterly 29(1/2), 31–52 (2000)

    Article  Google Scholar 

  • Miles, A.J., Matthews, B.M., Wilson, M.J.: Core RDF Vocabularies for Thesauri. SWAD-Europe Deliverable 8.1 (2001)

    Google Scholar 

  • Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)

    Book  MATH  Google Scholar 

  • Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: International Conference on Machine Learning (2001)

    Google Scholar 

  • McCallum, A., Freitag, D., Pereira, F.: Maximum entropy Markov models for information extraction and segmentation. In: International Conference on Machine Learning (2000)

    Google Scholar 

  • Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice Hall Signal Processing Series. Prentice-Hall, Inc., Englewood Cliffs (1993)

    MATH  Google Scholar 

  • Wellner, B., McCallum, A., Peng, F., Hay, M.: An Integrated, Conditional Model of Information Extraction and Coreference with Application to Citation Matching. In: UAI 2004 Proceedings of The 20th Conference On Uncertainty In Artificial Intelligence (2004)

    Google Scholar 

  • Rijsbergen, C.J.: Information Retrieval. Butterworth, London (1979)

    MATH  Google Scholar 

  • Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Linguisticae Investigationes 30 (2007)

    Google Scholar 

  • Ravin, Y., Wacholder, N.: Extracting Names from Natural-Language Text (1997)

    Google Scholar 

  • Mikheev, A.: A Knowledge-free Method for Capitalized Word Disambiguation. In: The 37th Annual Meeting of The Association for Computational Linguistics, pp. 159–166 (1999)

    Google Scholar 

  • Silva, J., Kozareva, Z., Gabriel, J., Lopes, P.: Cluster Analysis and Classification of Named Entities. In: Proceedings Conference on Language Resources and Evaluation (2004)

    Google Scholar 

  • Bikel, D., Daniel, M., Miller, S., Schwartz, R., Weischedel, R.: Nymble: a High-Performance Learning Name-finder. In: Proceedings of the Conference on Applied Natural Language Processing (1997)

    Google Scholar 

  • Settles, B.: Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Sets. In: Proc. Conference on Computational Linguistics, Joint Workshop on Natural Language Processing in Biomedicine and its Applications (2004)

    Google Scholar 

  • Yamashita, T., Matsumoto, Y.: Language independent morphological analysis. In: Proceedings of the Sixth Conference On Applied Natural Language Processing, pp. 232–238. Association for Computational Linguistics, Seattle (2000)

    Chapter  Google Scholar 

  • The Unicode Consortium: Unicode Text Segmentation (2010), http://www.unicode.org/reports/tr29/

  • McCallum, A.: MALLET: A Machine Learning for Language Toolkit (2002), http://mallet.cs.umass.edu

  • Lopes, M.I., Beall, J. (eds.): Working Group on Principles Underlying Subject Heading Languages, IFLA Section on Classification and Indexing: Principles Underlying Subject Heading Languages (SHLs). International Federation of Library Associations and Institutions (1999)

    Google Scholar 

  • Sekine, S., Isahara, H.: IREX: IR and IE Evaluation project in Japanese. In: Proc. Conference on Language Resources and Evaluation (2000)

    Google Scholar 

  • Sang, T.K., Erik, F.: Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition. In: Proceedings Conference on Natural Language Learning (2002)

    Google Scholar 

  • Sang, T.K., Erik, F., De Meulder, F.: Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In: Proceedings Conference on Natural Language Learning (2003)

    Google Scholar 

  • Isaac, A., Matthezing, H., Schlobach, S., Zinn, C.: Integrated access to cultural heritage resources through representation and alignment of controlled vocabularies. Library Review 57 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Freire, N., Borbinha, J., Calado, P. (2011). A Language Independent Approach for Named Entity Recognition in Subject Headings. In: Gradmann, S., Borri, F., Meghini, C., Schuldt, H. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2011. Lecture Notes in Computer Science, vol 6966. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24469-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24469-8_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24468-1

  • Online ISBN: 978-3-642-24469-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics