Skip to main content

A Method for Improving Word Representation Using Synonym Information

  • Conference paper
  • First Online:
Computational Science – ICCS 2021 (ICCS 2021)

Abstract

The emergence of word embeddings has created good conditions for natural language processing used in an increasing number of applications related to machine translation and language understanding. Several word-embedding models have been developed and applied, achieving considerably good performance. In addition, several enriching word embedding methods have been provided by handling various information such as polysemous, subwords, temporal, and spatial. However, prior popular vector representations of words ignored the knowledge of synonyms. This is a drawback, particularly for languages with large vocabularies and numerous synonym words. In this study, we introduce an approach to enrich the vector representation of words by considering the synonym information based on the vectors’ extraction and presentation from their context words. Our proposal includes three main steps: First, the context words of the synonym candidates are extracted using a context window to scan the entire corpus; second, these context words are grouped into small clusters using the latent Dirichlet allocation method; and finally, synonyms are extracted and converted into vectors from the synonym candidates based on their context words. In comparison to recent word representation methods, we demonstrate that our proposal achieves considerably good performance in terms of word similarity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://code.google.com/archive/p/word2vec/.

  2. 2.

    https://nlp.stanford.edu/projects/glove/.

  3. 3.

    https://fasttext.cc/docs/en/crawl-vectors.html.

  4. 4.

    https://mccormickml.com/2019/05/14/BERT-word-embeddings-tutorial/.

  5. 5.

    https://gist.github.com/mblondel/542786#file-lda_gibbs-py.

  6. 6.

    https://www.kaggle.com/azzouza2018/semevaldatadets?select=semeval-2013-train.csv.

  7. 7.

    https://pypi.org/project/emoji/.

  8. 8.

    https://pypi.org/project/aspell-python-py2/.

  9. 9.

    https://github.com/Svobikl/global_context/tree/master/AnalogyTester/evaluation_data.

  10. 10.

    https://www.nltk.org/_modules/nltk/tokenize/api.html.

References

  1. Al-Twairesh, N., Al-Negheimish, H.: Surface and deep features ensemble for sentiment analysis of Arabic tweets. IEEE Access. 7, 84122–84131 (2019)

    Article  Google Scholar 

  2. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)

    MATH  Google Scholar 

  3. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)

    Article  Google Scholar 

  4. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  5. Finkelstein, L., et al.: Placing search in context: the concept revisited. In: Proceedings of the 10th International Conference on World Wide Web, pp. 406–414 (2001)

    Google Scholar 

  6. Gong, H., Bhat, S., Viswanath, P.: Enriching word embeddings with temporal and spatial information. arXiv preprint arXiv:2010.00761 (2020)

  7. Guo, S., Yao, N.: Polyseme-aware vector representation for text classification. IEEE Access. 8, 135686–135699 (2020)

    Article  Google Scholar 

  8. Hamzehei, A., Wong, R.K., Koutra, D., Chen, F.: Collaborative topic regression for predicting topic-based social influence. Mach. Learn. 108(10), 1831–1850 (2019). https://doi.org/10.1007/s10994-018-05776-w

    Article  MathSciNet  MATH  Google Scholar 

  9. Harris, Z.S.: Distributional structure. Word. 10(2–3), 146–162 (1954)

    Article  Google Scholar 

  10. Hill, F., Reichart, R., Korhonen, A.: Simlex-999: evaluating semantic models with (genuine) similarity estimation. Comput. Linguist. 41(4), 665–695 (2015)

    Article  MathSciNet  Google Scholar 

  11. Jianqiang, Z., Xiaolin, G., Xuejun, Z.: Deep convolution neural networks for Twitter sentiment analysis. IEEE Access. 6, 23253–23260 (2018)

    Article  Google Scholar 

  12. Joulin, A., Grave, E., Bojanowski, P., Douze, M., Jégou, H., Mikolov, T.: Fasttext. zip: Compressing text classification models. arXiv preprint arXiv:1612.03651 (2016)

  13. Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)

  14. Kundi, F.M., Ahmad, S., Khan, A., Asghar, M.Z.: Detection and scoring of internet slangs for sentiment analysis using sentiwordnet. Life Sci. J. 11(9), 66–72 (2014)

    Google Scholar 

  15. Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016)

  16. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  17. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

    Google Scholar 

  18. Peters, M.E., et al.: Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018)

  19. Phan, H.T., Nguyen, N.T., Tran, V.C., Hwang, D.: An approach for a decision-making support system based on measuring the user satisfaction level on Twitter. Inf. Sci. (2021). https://doi.org/10.1016/j.ins.2021.01.008

    Article  MathSciNet  Google Scholar 

  20. Phan, H.T., Tran, V.C., Nguyen, N.T., Hwang, D.: Improving the performance of sentiment analysis of tweets containing fuzzy sentiment using the feature ensemble model. IEEE Access. 8, 14630–14641 (2020)

    Article  Google Scholar 

  21. Qi, Y., Sachan, D.S., Felix, M., Padmanabhan, S.J., Neubig, G.: When and why are pre-trained word embeddings useful for neural machine translation? arXiv preprint arXiv:1804.06323 (2018)

  22. Rezaeinia, S.M., Rahmani, R., Ghodsi, A., Veisi, H.: Sentiment analysis based on improved pre-trained word embeddings. Expert Syst. Appl. 117, 139–147 (2019)

    Article  Google Scholar 

  23. Řezanková, H.: Different approaches to the silhouette coefficient calculation in cluster evaluation. In: 21st International Scientific Conference AMSE Applications of Mathematics and Statistics in Economics 2018, pp. 1–10 (2018)

    Google Scholar 

  24. Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Commun. ACM. 8(10), 627–633 (1965)

    Article  Google Scholar 

  25. Sedgwick, P.: Spearman’s rank correlation coefficient. Bmj. 349 (2014)

    Google Scholar 

  26. Svoboda, L., Brychcın, T.: Improving word meaning representations using wikipedia categories. Neural Netw. World. 523, 534 (2018)

    Google Scholar 

  27. Svoboda, L., Brychcín, T.: Enriching word embeddings with global information and testing on highly inflected language. Computación y Sistemas. 23(3) (2019)

    Google Scholar 

  28. Ulčar, M., Robnik-Šikonja, M.: High quality ELMo embeddings for seven less-resourced languages. arXiv preprint arXiv:1911.10049 (2019)

  29. Wang, B., Wang, A., Chen, F., Wang, Y., Kuo, C.C.J.: Evaluating word embedding models: methods and experimental results. APSIPA Trans. Signal Inf. Process. 8 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dosam Hwang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Phan, H.T., Nguyen, N.T., Musaev, J., Hwang, D. (2021). A Method for Improving Word Representation Using Synonym Information. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science(), vol 12744. Springer, Cham. https://doi.org/10.1007/978-3-030-77967-2_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-77967-2_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-77966-5

  • Online ISBN: 978-3-030-77967-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics