Exploiting Multiple Resources for Word-Phrase Semantic Similarity Evaluation

Jin, Xiaoqiang; Sun, Chengjie; Lin, Lei; Wang, Xiaolong

doi:10.1007/978-3-319-12277-9_5

Xiaoqiang Jin²¹,
Chengjie Sun²¹,
Lei Lin²¹ &
…
Xiaolong Wang²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8801))

Included in the following conference series:

1602 Accesses
1 Citations

Abstract

Previous researches on semantic similarity calculating have been mainly focused on documents, sentences or concepts. In this paper, we study the semantic similarity of words and compositional phrases. The task is to judge the semantic similarity of a word and a short sequence of words. Based on structured resource (WordNet), semi-structured resource (Wikipedia) and unstructured resource (Web), this paper extracts rich effective features to represent the word-phrase pair. The task can be treated as a binary classification problem and we employ Support Vector Machine to estimate whether the word and phrase is similar given a word-phrase pair. Experiments are conducted on SemEval 2013 Task5a. Our method achieves 82.9% in accuracy, and outperforms the best system (80.3%) that participates in the task. Experimental results demonstrate the effectiveness of our proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Turney, P.D., Pantel, P.: From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research 37(1), 141–188 (2010)
MathSciNet MATH Google Scholar
Wittgenstein, L.: Philosophical Investigations. Blackwell. Translated by Anscombe, G.E.M. (1953)
Google Scholar
Harris, Z.: Distributional structure. Word 10(23), 146–162 (1954)
Google Scholar
Weaver, W.: Translation. In: Locke, W., Booth, D. (eds.) Machine Translation of Languages: Fourteen Essays. MIT Press, Cambridge (1955)
Google Scholar
Firth, J.R.: A synopsis of linguistic theory 1930-1955. In: Studies in Linguistic Analysis, pp. 1–32. Blackwell, Oxford (1957)
Google Scholar
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society for Information Science (JASIS) 41(6), 391–407 (1990)
Article Google Scholar
Landauer, T.K., Dumais, S.T.: A solution to Plato’s problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review 104(2), 211–240 (1997)
Article Google Scholar
Han, E.-H(S.), Karypis, G.: Centroid-based document classification: Analysis and experimental results. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCC (LNAI), vol. 1910, pp. 424–431. Springer, Heidelberg (2000)
Chapter Google Scholar
Gabrilovich, E., Markovitch, S.: Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis. In: IJCAI, vol. 7, pp. 1606–1611 (2007)
Google Scholar
Gabrilovich, E., Markovitch, S.: Wikipedia-based semantic interpretation for natural language processing. Journal of Artificial Intelligence Research 34(2), 443 (2009)
MATH Google Scholar
Strube, M., Ponzetto, S.P.: WikiRelate! Computing semantic relatedness using Wikipedia. In: AAAI, vol. 6, pp. 1419–1424 (2006)
Google Scholar
Witten, I., Milne, D.: An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In: Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, pp. 25–30. AAAI Press, Chicago (2008)
Google Scholar
Resnik, P.: Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. arXiv preprint arXiv:1105.5444 (2011)
Google Scholar
Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138. Association for Computational Linguistics (1994)
Google Scholar
Leacock, C., Chodorow, M.: Combining local context and WordNet similarity for word sense identification. WordNet: An Electronic Lexical Database 49(2), 265–283 (1998)
Google Scholar
Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. arXiv preprint cmp-lg/9511007 (1995)
Google Scholar
Lin, D.: An information-theoretic definition of similarity. In: ICML, vol. 98, pp. 296–304 (1998)
Google Scholar
Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. arXiv preprint cmp-lg/9709008 (1997)
Google Scholar
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
MATH Google Scholar
Firth, J.R.: A Synopsis of Linguistic Theory 1930-1955. In: Studies in Linguistic Analysis, pp. 1–32. Philological Society, Oxford (1957), Reprinted in Palmer, F.R. (ed.): Selected Papers of J.R. Firth 1952-1959. Longman, London (1968)
Google Scholar
Turney, P.: Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL (2001)
Google Scholar
Chen, H.H., Lin, M.S., Wei, Y.C.: Novel association measures using web search with double checking. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 1009–1016. Association for Computational Linguistics (2006)
Google Scholar
Lu, G., Huang, P., He, L., et al.: A new semantic similarity measuring method based on web search engines. WSEAS Transactions on Computers 9(1), 1–10 (2010)
MATH Google Scholar
Bollegala, D., Matsuo, Y., Ishizuka, M.: Measuring semantic similarity between words using web search engines. In: WWW, vol. 7, pp. 757–766 (2007)
Google Scholar
Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: AAAI, vol. 6, pp. 775–780 (2006)
Google Scholar
Bar, D., Biemann, C., Gurevych, I., Zesch, T.: Ukp: Computing semantic textual similarity by combining multiple content similarity measures. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics-vol. 1: Proceedings of the Main Conference and the Shared Task, and vol. 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, pp. 435–440. Association for Computational Linguistics (2012)
Google Scholar
Ng, A.: Regularization and model selection, CS 229 Machine Learning Course Materials, pp. 4–5
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Harbin Institute of Technology, China
Xiaoqiang Jin, Chengjie Sun, Lei Lin & Xiaolong Wang

Authors

Xiaoqiang Jin
View author publications
You can also search for this author in PubMed Google Scholar
Chengjie Sun
View author publications
You can also search for this author in PubMed Google Scholar
Lei Lin
View author publications
You can also search for this author in PubMed Google Scholar
Xiaolong Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Haidian District, 100084, Beijing, China
Maosong Sun & Yang Liu &
Chinese Academy of Sciences, Institute of Automation, 100190, Beijing, China
Jun Zhao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jin, X., Sun, C., Lin, L., Wang, X. (2014). Exploiting Multiple Resources for Word-Phrase Semantic Similarity Evaluation. In: Sun, M., Liu, Y., Zhao, J. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2014 2014. Lecture Notes in Computer Science(), vol 8801. Springer, Cham. https://doi.org/10.1007/978-3-319-12277-9_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-12277-9_5
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12276-2
Online ISBN: 978-3-319-12277-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics