Abstract
When you search for information regarding a particular person on the web, a search engine returns many pages. Some of these pages may be for people with the same name. How can we disambiguate these different people with the same name? This paper presents an unsupervised algorithm which produces key phrases for the different people with the same name. These key phrases could be used to further narrow down the search, leading to more person specific unambiguous information. The algorithm we propose does not require any biographical or social information regarding the person. Although there are some previous work in personal name disambiguation on the web, to our knowledge, this is the first attempt to extract key phrases to disambiguate the different persons with the same name. To evaluate our algorithm, we collected and hand labeled a dataset of over 1000 Web pages retrieved from Google using personal name queries. Our experimental results shows an improvement over the existing methods for namesake disambiguation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Andritsos, P., Miller, R.J., Tsapars, P.: Information-theoretic tools for mining database structure from large data sets. In: Proceedings of the ACM SIGMOD Conference (2004)
Bagga, A., Baldwin, B.: Entity-based cross-document coreferencing using the vector space model. In: Proceedings of COLING, pp. 79–85 (1998)
Banerjee, S., Pedersen, T.: An adapted lesk algorithm for word sense disambiguation using word net. In: Proceedings of the third international conference on computational linguistics and intelligent text processing, pp. 136–145 (2002)
Bekkerman, R., McCallum, A.: Disambiguating web appearances of people in a social network. In: Proceedings of the 14th international conference on World Wide Web, pp. 463–470 (2005)
Frantzi, K., Ananiadou, S.: Extracting nested collocations. In: 16th Conference on Computational Lingustics, pp. 41–46 (1996)
Frantzi, K., Ananiadou, S.: The c-value/nc-value domain independent method for multi-word term extraction. Journal of Natural Language Processing 6(3), 145–179 (1999)
Hernandez, M., Stolfo, S.: The merge/purge problem for large databases. In: SIGMOD Conference, pp. 127–138 (1995)
Lee, L.: On the effectiveness of the skew divergence for statistical language analysis. Artificial Intelligence and Statistics, 65–5 (2001)
Li, X., Morie, P., Roth, D.: Semantic integration in text, from ambiguous names to identifiable entities. AI Magazine, American Association for Artificial Intelligence, pp. 45–58 (Spring 2005)
Mann, G.S., Yarowsky, D.: Unsupervised personal name disambiguation. In: Proceedings of CoNLL-2003, pp. 33–40 (2003)
McCallum, A., Wellner, B.: Toward conditional models of identity uncertainty with application to proper noun coreference. In: IJCAI Workshop on Information Integration on the Web (2003)
McCarthy, D., Koeling, R., Weeds, J., Carroll, J.: Finding predominant word senses in untagged text. In: Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL 2004), pp. 279–286 (2004)
Pedersen, T., Purandare, A., Kulkarni, A.: Name discrimination by clustering similar contexts. In: Proceedings of the Sixth International Conference on Intelligent Text Processing and Computational Linguistics (2005)
Sahami, M., Heilman, T.: A web-based kernel function for matching short text snippets. In: International Workshop located at the 22nd International Conference on Machine Learning, ICML 2005 (2005)
Schutze, H.: Automatic word sense discrimination. Computational Linguistics 24(1), 97–123 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bollegala, D., Matsuo, Y., Ishizuka, M. (2006). Extracting Key Phrases to Disambiguate Personal Names on the Web. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2006. Lecture Notes in Computer Science, vol 3878. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11671299_24
Download citation
DOI: https://doi.org/10.1007/11671299_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32205-4
Online ISBN: 978-3-540-32206-1
eBook Packages: Computer ScienceComputer Science (R0)