Statistical Identification of Domain-Specific Keyterms for Text Summarisation

Yuwono, Budi; Adriani, Mirna

doi:10.1007/3-540-49653-X_39

Budi Yuwono⁵ &
Mirna Adriani⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1513))

Included in the following conference series:

International Conference on Theory and Practice of Digital Libraries

864 Accesses

Abstract

We believe that in order to be useful a text summarisation technique must be domain dependent, in that the resulting summary must cover the important aspects and concepts specific to the subject matter’s domain. The main problem with a typical domain-dependent text summarisation technique is the cost of acquiring and hand-coding the required domain-specific knowledge into the system, e.g., in the form of phrase-structure templates. To solve this problem, we propose a solution which uses automatically retrieved sample documents as the source of the domain-specific knowledge, and extracts the knowledge in the form of keyterms. These keyterms represent the key aspects and concepts (terminology) relevant to the input document. The sample documents are retrieved from a collection, called base collection-containing documents of various topics, based on their similarity with the input document. The input document is then summarised by extracting a number of sentences containing the keyterms.

Our text summarisation technique is based on the statistical distribution of words among documents in the base collection, within individual documents, and among sentences in the input document. In particular, statistically-based formula are employed for scoring each of the candidate sample documents, keyterms, and key sentences. Our technique makes use of standard word or term distribution parameters that are commonly provided or can be easily obtained through the use of modern text retrieval systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Author information

Authors and Affiliations

Graham Technology Plc., 41 Carlyle Avenue, Hillington, Glasgow, G52 4XX, Scotland
Budi Yuwono
Department of Computing Science, University of Glasgow, Glasgow, G12 8QQ, Scotland
Mirna Adriani

Authors

Budi Yuwono
View author publications
You can also search for this author in PubMed Google Scholar
Mirna Adriani
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Foundation for Research and Technology - Hellas (FORTH) Science and Technology Park of Crete, Institute of Computer Science (ICS), GR-71110, Heraklion, Crete, Greece
Christos Nikolaou & Constantine Stephanidis &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yuwono, B., Adriani, M. (1998). Statistical Identification of Domain-Specific Keyterms for Text Summarisation. In: Nikolaou, C., Stephanidis, C. (eds) Research and Advanced Technology for Digital Libraries. ECDL 1998. Lecture Notes in Computer Science, vol 1513. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49653-X_39

Download citation

DOI: https://doi.org/10.1007/3-540-49653-X_39
Published: 15 March 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65101-7
Online ISBN: 978-3-540-49653-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics