Skip to main content

Statistical Identification of Domain-Specific Keyterms for Text Summarisation

  • Conference paper
  • First Online:
Research and Advanced Technology for Digital Libraries (ECDL 1998)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1513))

Included in the following conference series:

  • 864 Accesses

Abstract

We believe that in order to be useful a text summarisation technique must be domain dependent, in that the resulting summary must cover the important aspects and concepts specific to the subject matter’s domain. The main problem with a typical domain-dependent text summarisation technique is the cost of acquiring and hand-coding the required domain-specific knowledge into the system, e.g., in the form of phrase-structure templates. To solve this problem, we propose a solution which uses automatically retrieved sample documents as the source of the domain-specific knowledge, and extracts the knowledge in the form of keyterms. These keyterms represent the key aspects and concepts (terminology) relevant to the input document. The sample documents are retrieved from a collection, called base collection-containing documents of various topics, based on their similarity with the input document. The input document is then summarised by extracting a number of sentences containing the keyterms.

Our text summarisation technique is based on the statistical distribution of words among documents in the base collection, within individual documents, and among sentences in the input document. In particular, statistically-based formula are employed for scoring each of the candidate sample documents, keyterms, and key sentences. Our technique makes use of standard word or term distribution parameters that are commonly provided or can be easily obtained through the use of modern text retrieval systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yuwono, B., Adriani, M. (1998). Statistical Identification of Domain-Specific Keyterms for Text Summarisation. In: Nikolaou, C., Stephanidis, C. (eds) Research and Advanced Technology for Digital Libraries. ECDL 1998. Lecture Notes in Computer Science, vol 1513. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49653-X_39

Download citation

  • DOI: https://doi.org/10.1007/3-540-49653-X_39

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-65101-7

  • Online ISBN: 978-3-540-49653-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics