Keyword Identification Using Text Graphlet Patterns

Nabhan, Ahmed Ragab; Shaalan, Khaled

doi:10.1007/978-3-319-41754-7_13

Ahmed Ragab Nabhan^18,19 &
Khaled Shaalan^20,21

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9612))

Included in the following conference series:

International Conference on Applications of Natural Language to Information Systems

2133 Accesses
2 Citations

Abstract

Keyword identification is an important task that provides useful information for NLP applications including: document retrieval, clustering, and categorization, among others. State-of-the-art methods rely on local features of words (e.g. lexical, syntactic, and presentation features) to assess their candidacy as keywords. In this paper, we propose a novel keyword identification method that relies on representation of text abstracts as word graphs. The significance of the proposed method stems from a flexible data representation that expands the context of words to span multiple sentences and thus can enable capturing of important non-local graph topological features. Specifically, graphlets (small subgraph patterns) were efficiently extracted and scored to reflect the statistical dependency between these graphlet patterns and words labeled as keywords. Experimental results demonstrate the capability of the graphlet patterns in a keyword identification task when applied to MEDLINE, a standard research abstract dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Andrade, M.A., Valencia, A.: Automatic extraction of keywords from scientific text: application to the knowledge domain of protein families. Bioinformatics 14(7), 600–607 (1998)
Article Google Scholar
Matsuo, Y., Ishizuka, M.: Keyword extraction from a single document using word co-occurrence statistical information. Int. J. Artif. Intell. Tools 13(1), 157–169 (2004)
Article Google Scholar
Hammouda, K.M., Matute, D.N., Kamel, M.S.: CorePhrase: keyphrase extraction for document clustering. In: Perner, P., Imiya, A. (eds.) MLDM 2005. LNCS (LNAI), vol. 3587, pp. 265–274. Springer, Heidelberg (2005)
Chapter Google Scholar
Hasan, K.S., Ng, V.: Automatic keyphrase extraction: a survey of the state of the art. In: ACL, pp. 1262–1273 (2005)
Google Scholar
Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th International Conference on Semantic Systems, pp. 121–124. ACM (2013)
Google Scholar
Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: KEA: practical automatic keyphrase extraction. In: Proceedings of the Fourth ACM Conference on Digital Libraries, pp. 254–255. ACM (1999)
Google Scholar
Liu, F., Pennell, D., Liu, F., Liu, Y.: Unsupervised approaches for automatic keyword extraction using meeting transcripts. In: Proceedings of Human Language Technologies, pp. 620–628. Association for Computational Linguistics (2009)
Google Scholar
Tomokiyo, T., Hurst, M.: A language model approach to keyphrase extraction. In: Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, vol. 18, pp. 33–40. Association for Computational Linguistics (2003)
Google Scholar
Zhang, Y., Zincir-Heywood, N., Milios, E.: Narrative text classification for automatic key phrase extraction in web document corpora. In: Proceedings of the 7th Annual ACM International Workshop on Web Information and Data Management, pp. 51–58. ACM (2005)
Google Scholar
Mihalcea, R., Tarau, P.: TextRank: Bringing Order into Texts. Association for Computational Linguistics, Stroudsburg (2004)
Google Scholar
Erkan, G., Radev, D.R.: LexRank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)
Google Scholar
Bellaachia, A., Al-Dhelaan, M.: Ne-rank: a novel graph-based keyphrase extraction in twitter. In: Proceedings of the 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology, vol. 01, pp. 372–379. IEEE Computer Society (2012)
Google Scholar
Jiang, C., Coenen, F., Sanderson, R., Zito, M.: Text classification using graph mining-based feature extraction. Knowl. Based Syst. 23(4), 302–308 (2010)
Article Google Scholar
Blanco, R., Lioma, C.: Graph-based term weighting for information retrieval. Inf. Retrieval 15(1), 54–92 (2012)
Article Google Scholar
Pržulj, N.: Biological network comparison using graphlet degree distribution. Bioinformatics 23(2), e177–e183 (2007)
Article Google Scholar
Vacic, V., Iakoucheva, L.M., Lonardi, S., Radivojac, P.: Graphlet kernels for prediction of functional residues in protein structures. J. Comput. Biol. 17(1), 55–72 (2010)
Article MathSciNet Google Scholar
Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the COLING/ACL on Interactive Presentation Sessions, pp. 69–72. Association for Computational Linguistics (2006)
Google Scholar
Aizawa, A.: An information-theoretic perspective of TF-IDF measures. Inf. Process. Manage. 39(1), 45–65 (2003)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computers and Information, Fayoum University, Faiyum, Egypt
Ahmed Ragab Nabhan
Member Technology, Sears Holdings, Hoffman Estates, USA
Ahmed Ragab Nabhan
The British University in Dubai, Dubai, United Arab Emirates
Khaled Shaalan
School of Informatics, University of Edinburgh, Edinburgh, UK
Khaled Shaalan

Authors

Ahmed Ragab Nabhan
View author publications
You can also search for this author in PubMed Google Scholar
Khaled Shaalan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmed Ragab Nabhan .

Editor information

Editors and Affiliations

ConservatoireNational desArts et Métiers, Paris, France
Elisabeth Métais
University of Salford, Salford, United Kingdom
Farid Meziane
University of Salford, Salford, United Kingdom
Mohamad Saraee
Oakland University, Rochester, Michigan, USA
Vijayan Sugumaran
University of Salford, Salford, United Kingdom
Sunil Vadera

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nabhan, A.R., Shaalan, K. (2016). Keyword Identification Using Text Graphlet Patterns. In: Métais, E., Meziane, F., Saraee, M., Sugumaran, V., Vadera, S. (eds) Natural Language Processing and Information Systems. NLDB 2016. Lecture Notes in Computer Science(), vol 9612. Springer, Cham. https://doi.org/10.1007/978-3-319-41754-7_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-41754-7_13
Published: 17 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41753-0
Online ISBN: 978-3-319-41754-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics