Skip to main content

Technical terminology for domain specification and content characterisation

  • Conference paper
  • First Online:
Information Extraction A Multidisciplinary Approach to an Emerging Information Technology (SCIE 1997)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1299))

Included in the following conference series:

Abstract

The identification and extraction of technical terms is one of the better understood and most robust natural language processing (NLP) technologies within the current state of the art of language engineering. What is particularly interesting here is the clear understanding how to derive, from their linguistic properties, computational procedures for reliable identification and extraction of terms from technical, scientific, prose. In generic information management contexts, terms have been associated both with procedures seeking to identify a term set which uniquely distinguishes a document within a nearly homogenous document collection, and with procedures seeking to extract a representative terms sample which uniquely characterises a document's content. There is a wide range of uses for terminology, commonly identified with e.g. text indexing, computational lexicology, and machine-assisted translation; most of these employ the notion of terminology being representative of a given domain. This paper discusses some specific extensions of the terminology identification technology to make it fully capable of domain specification; it also presents extensions of the technology beyond domain specification, to the purpose of document characterisation. These extensions make terminology identification the foundation of an operational environment for document processing and content characterisation and abstraction; more generally, it becomes an immensely empowering technology in the age of growing information overload.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 29.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 39.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Apple Computer, Inc., 20525 Mariani Avenue, Cupertino, CA 95014-6299. Macintosh User's Guide, 1994.

    Google Scholar 

  2. B. Boguraev. WORDWEB and APPLE GUIDE: a comparative evaluation. Technical report, Internal Report, Advanced Technologies Group, Apple Computer, 1995.

    Google Scholar 

  3. B. Boguraev. Content analysis via lexical semantics. The Apple Research Labs Review, pages 2–13, September 1996.

    Google Scholar 

  4. B. Boguraev and C. Kennedy. Salience-based content characterisation of text documents. In Proceedings of ACL'97 Workshop on Intelligent, Scalable Text Summarisation, Madrid, Spain, 1997.

    Google Scholar 

  5. B. Boguraev and J. Pustejovsky, editors. Corpus processing for lexical acquisition. MIT Press, Cambridge, Mass, 1996.

    Google Scholar 

  6. D. Bourigault. Surface grammatical analysis for the extraction of terminological noun phrases. In 14th International Conference on Computational Linguistics, Nantes, France, 1992.

    Google Scholar 

  7. J. Buchan. Heart's journey in winter. Harvill Collins, London, 1996.

    Google Scholar 

  8. I. Dagan and K. Church. Termight: identifying and translating technical terminology. In 4th Conference on Applied Natural Language Processing, Stuttgart, Germany, 1995.

    Google Scholar 

  9. M. Hearst. Multi-paragraph segmentation of expository text. In 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico, 1994.

    Google Scholar 

  10. I. Heim. The semantics of definite and indefinite noun phrases. PhD thesis, University of Massachusetts, Department of Linguistics, Amherst, MA, 1981. unpublished.

    Google Scholar 

  11. J. Hodges, S. Yie, R. Reighart, and L. Bogges. An automated system that assists in the generation of document indexes. Natural Language Engineering, 2:137–160, 1996.

    Article  Google Scholar 

  12. N. Hutheesing. Gilbert Amelio's grand scheme to rescue Apple. Forbes Magazine, December 16, 1996.

    Google Scholar 

  13. M. Johnston, B. Boguraev, and J. Pustejovsky. The structure and interpretation of compound nominals. In AAAI Spring Symposium on Generativity and the Lexicon, Stanford, 1994.

    Google Scholar 

  14. J. S. Justeson and S. M. Katz. Technical terminology: some linguistic properties and an algorithm for identification in text. Natural Language Engineering, 1(1):927, 1995.

    Article  Google Scholar 

  15. L. Karttunen. Discourse referents. In J. McCawley, editor, Syntax and Semantics. Academic Press, New York, NY, 1968.

    MATH  Google Scholar 

  16. C. Kennedy and B. Boguraev. Anaphora for everyone: Pronominal anaphora resolution without a parser, In Proceedings of COLING-96 (16th International Conference on Computational Linguistics), Copenhagen, DK, 1996.

    Google Scholar 

  17. C. Kennedy and B. Boguraev. Anaphora in a wider context: Tracking discourse referents. In W. Wahlster, editor, Proceedings of ECAI-96 (12th European Conference on Artificial Intelligence), Budapest, Hungary, 1996. John Wiley and Sons, Ltd, London/New York.

    Google Scholar 

  18. S. Lappin and H. Leass. An algorithm for pronominal anaphora resolution. Computational Linguistics, 20(4):535–561, 1994.

    Google Scholar 

  19. I. Mani and T. R. MacMillan. Identifying unknown proper names in newswire text. In B. Boguraev and J. Pustejovsky, editors, Corpus Processing for Lexical Acquisition, pages 41–60. MIT Press, 1996.

    Google Scholar 

  20. M. M. McCord. Slot grammar: a system for simpler construction of practical natural language grammars. In R. Studer, editor, Natural language and logic: international scientific symposium, Lecture Notes in Computer Science, pages 118–145. Springer Verlag, Berlin, 1990.

    Chapter  Google Scholar 

  21. G. Salton. Syntactic approaches to automatic book indexing. In 26th Annual Meeting of the Association for Computational Linguistics, Buffalo, New York, 1988.

    Google Scholar 

  22. G. Salton, Z. Zhao, and C. Buckley. A simple syntactic approach for the generation of indexing phrases. Technical Report 90-1137, Department of Computer Science, Cornell University, 1990.

    Google Scholar 

  23. S. Waterman. Distinguished usage. In B. Boguraev and J. Pustejovsky, editors, Corpus processing for domain acquisition, pages 143–172. MIT Press, Cambridge, MA, 1996.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Maria Teresa Pazienza

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Boguraev, B., Kennedy, C. (1997). Technical terminology for domain specification and content characterisation. In: Pazienza, M.T. (eds) Information Extraction A Multidisciplinary Approach to an Emerging Information Technology. SCIE 1997. Lecture Notes in Computer Science, vol 1299. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63438-X_5

Download citation

  • DOI: https://doi.org/10.1007/3-540-63438-X_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-63438-6

  • Online ISBN: 978-3-540-69548-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics