A New Domain Independent Keyphrase Extraction System

Pudota, Nirmala; Dattolo, Antonina; Baruzzo, Andrea; Tasso, Carlo

doi:10.1007/978-3-642-15850-6_8

Nirmala Pudota⁴,
Antonina Dattolo⁴,
Andrea Baruzzo⁴ &
…
Carlo Tasso⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 91))

Included in the following conference series:

Italian Research Conference on Digital Libraries

693 Accesses
8 Citations

Abstract

In this paper we present a keyphrase extraction system that can extract potential phrases from a single document in an unsupervised, domain-independent way. We extract word n-grams from input document. We incorporate linguistic knowledge (i.e., part-of-speech tags), and statistical information (i.e., frequency, position, lifespan) of each n-gram in defining candidate phrases and their respective feature sets. The proposed approach can be applied to any document, however, in order to know the effectiveness of the system for digital libraries, we have carried out the evaluation on a set of scientific documents, and compared our results with current keyphrase extraction systems.

The authors acknowledge the financial support of the Italian Ministry of Education, University and Research (MIUR) within the FIRB project number RBIN04M8S8.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Barker, K., Cornacchia, N.: Using noun phrase heads to extract document keyphrases. In: Hamilton, H.J. (ed.) Canadian AI 2000. LNCS (LNAI), vol. 1822, pp. 40–52. Springer, Heidelberg (2000)
Chapter Google Scholar
Baruzzo, A., Dattolo, A., Pudota, N., Tasso, C.: A general framework for personalized text classification and annotation. In: Houben, G.-J., McCalla, G., Pianesi, F., Zancanaro, M. (eds.) UMAP 2009. LNCS, vol. 5535, pp. 31–39. Springer, Heidelberg (2009)
Google Scholar
Baruzzo, A., Dattolo, A., Pudota, N., Tasso, C.: Recommending new tags using domain-ontologies. In: IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, vol. 3, pp. 409–412. IEEE, Milan (2009)
Chapter Google Scholar
Berger, A.L., Mittal, V.O.: Ocelot: a system for summarizing web pages. In: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 144–151. ACM, New York (2000)
Google Scholar
Bracewell, D.B., Ren, F., Kuroiwa, S.: Multilingual single document keyword extraction for information retrieval. In: Proceedings of the 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering, Wuhan, pp. 517–522 (2005)
Google Scholar
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Computer Networks 30(1-7), 107–117 (1998)
Google Scholar
Dattolo, A., Ferrara, F., Tasso, C.: Supporting personalized user concept spaces and recommendations for a publication sharing system. In: Houben, G.-J., McCalla, G., Pianesi, F., Zancanaro, M. (eds.) UMAP 2009. LNCS, vol. 5535, pp. 325–330. Springer, Heidelberg (2009)
Chapter Google Scholar
D’Avanzo, E., Magnini, B., Vallin, A.: Keyphrase extraction for summarization purposes: the lake system at duc2004. In: DUC Workshop, Human Language Technology conference/North American chapter of the Association for Computational Linguistics annual meeting, Boston, USA (2004)
Google Scholar
Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C., Nevill-Manning, C.G.: Domain-specific keyphrase extraction. In: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, pp. 668–673. Morgan Kaufmann Publishers, San Francisco (1999)
Google Scholar
Hammouda, K.M., Matute, D.N., Kamel, M.S.: Corephrase: Keyphrase extraction for document clustering. In: Perner, P., Imiya, A. (eds.) MLDM 2005. LNCS (LNAI), vol. 3587, pp. 265–274. Springer, Heidelberg (2005)
Chapter Google Scholar
Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 conference on Empirical Methods in Natural Language Processing, pp. 216–223. Association for Computational Linguistics, Morristown (2003)
Chapter Google Scholar
Justeson, J., Katz, S.: Technical terminology: some linguistic properties and an algorithm for identification in text. Natural Language Engineering 1, 9–27 (1995)
Article Google Scholar
Kosovac, B., Vanier, D.J., Froese, T.M.: Use of keyphrase extraction software for creation of an AEC/FM thesaurus. Electronic Journal of Information Technology in Construction 5, 25–36 (2000)
Google Scholar
Krulwich, B., Burkey, C.: Learning user information interests through the extraction of semantically significant phrases. In: Hearst, M., Hirsh, H. (eds.) AAAI 1996 Spring Symposium on Machine Learning in Information Access, pp. 110–112. AAAI Press, California (1996)
Google Scholar
Kumar, N., Srinathan, K.: Automatic keyphrase extraction from scientific documents using n-gram filtration technique. In: Proceedings of the Eight ACM symposium on Document engineering, pp. 199–208. ACM, New York (2008)
Chapter Google Scholar
Litvak, M., Last, M.: Graph-based keyword extraction for single-document summarization. In: Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization, pp. 17–24. ACL, Morristown (2008)
Chapter Google Scholar
Liu, Z., Li, P., Zheng, Y., Sun, M.: Clustering to find exemplar terms for keyphrase extraction. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pp. 257–266. ACL, Singapore (2009)
Google Scholar
Medelyan, O., Frank, E., Witten, I.H.: Human-competitive tagging using automatic keyphrase extraction. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pp. 1318–1327. ACL, Singapore (2009)
Google Scholar
Nguyen, T.D., Kan, M.Y.: Keyphrase extraction in scientific publications. In: Goh, D.H.L., Cao, T.H., Sølvberg, I., Rasmussen, E.M. (eds.) ICADL 2007. LNCS, vol. 4822, pp. 317–326. Springer, Heidelberg (2007)
Chapter Google Scholar
Porter, M.F.: An algorithm for suffix stripping. Readings in information retrieval, 313–316 (1997)
Google Scholar
Song, M., Song, I.Y., Allen, R.B., Obradovic, Z.: Keyphrase extraction-based query expansion in digital libraries. In: Proceedings of the 6th ACM/IEEE-CS joint Conference on Digital libraries, pp. 202–209. ACM, New York (2006)
Chapter Google Scholar
Turney, P.D.: Learning algorithms for keyphrase extraction. Information Retrieval 2(4), 303–336 (2000)
Article Google Scholar
Wan, X., Xiao, J.: Single document keyphrase extraction using neighborhood knowledge. In: Proceedings of the 23rd National Confernce on Artificial Intelligence, pp. 855–860. AAAI Press, Chicago (2008)
Google Scholar
Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: Kea: practical automatic keyphrase extraction. In: Proceedings of the fourth ACM conference on Digital libraries, pp. 254–255. ACM, New York (1999)
Chapter Google Scholar
Wu, Y.F.B., Li, Q.: Document keyphrases as subject metadata: incorporating document key concepts in search results. Information Retrieval 11(3), 229–249 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Artificial Intelligence Lab, Department of Mathematics and Computer Science, University of Udine, Italy
Nirmala Pudota, Antonina Dattolo, Andrea Baruzzo & Carlo Tasso

Authors

Nirmala Pudota
View author publications
You can also search for this author in PubMed Google Scholar
Antonina Dattolo
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Baruzzo
View author publications
You can also search for this author in PubMed Google Scholar
Carlo Tasso
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Information Engineering, University of Padua, Via Gradenigo, 6/b, 35131, Padova, Italy
Maristella Agosti
Dipartimento di INformatics, Università di Bari, Via Orabona 4, 70126, Bari, Italy
Floriana Esposito
Consiglio Nazionale delle Richerche, Istituto di Scienza e Tecnologie dell’Informazione, Via Moruzzi, 1, 56124, Pisa, Italy
Costantino Thanos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pudota, N., Dattolo, A., Baruzzo, A., Tasso, C. (2010). A New Domain Independent Keyphrase Extraction System. In: Agosti, M., Esposito, F., Thanos, C. (eds) Digital Libraries. IRCDL 2010. Communications in Computer and Information Science, vol 91. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15850-6_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-15850-6_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15849-0
Online ISBN: 978-3-642-15850-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics