A Method for Improving Word Representation Using Synonym Information

Phan, Huyen Trang; Nguyen, Ngoc Thanh; Musaev, Javokhir; Hwang, Dosam

doi:10.1007/978-3-030-77967-2_28

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12744))

Included in the following conference series:

International Conference on Computational Science

1087 Accesses

Abstract

The emergence of word embeddings has created good conditions for natural language processing used in an increasing number of applications related to machine translation and language understanding. Several word-embedding models have been developed and applied, achieving considerably good performance. In addition, several enriching word embedding methods have been provided by handling various information such as polysemous, subwords, temporal, and spatial. However, prior popular vector representations of words ignored the knowledge of synonyms. This is a drawback, particularly for languages with large vocabularies and numerous synonym words. In this study, we introduce an approach to enrich the vector representation of words by considering the synonym information based on the vectors’ extraction and presentation from their context words. Our proposal includes three main steps: First, the context words of the synonym candidates are extracted using a context window to scan the entire corpus; second, these context words are grouped into small clusters using the latent Dirichlet allocation method; and finally, synonyms are extracted and converted into vectors from the synonym candidates based on their context words. In comparison to recent word representation methods, we demonstrate that our proposal achieves considerably good performance in terms of word similarity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Al-Twairesh, N., Al-Negheimish, H.: Surface and deep features ensemble for sentiment analysis of Arabic tweets. IEEE Access. 7, 84122–84131 (2019)
Article Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)
MATH Google Scholar
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Article Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Finkelstein, L., et al.: Placing search in context: the concept revisited. In: Proceedings of the 10th International Conference on World Wide Web, pp. 406–414 (2001)
Google Scholar
Gong, H., Bhat, S., Viswanath, P.: Enriching word embeddings with temporal and spatial information. arXiv preprint arXiv:2010.00761 (2020)
Guo, S., Yao, N.: Polyseme-aware vector representation for text classification. IEEE Access. 8, 135686–135699 (2020)
Article Google Scholar
Hamzehei, A., Wong, R.K., Koutra, D., Chen, F.: Collaborative topic regression for predicting topic-based social influence. Mach. Learn. 108(10), 1831–1850 (2019). https://doi.org/10.1007/s10994-018-05776-w
Article MathSciNet MATH Google Scholar
Harris, Z.S.: Distributional structure. Word. 10(2–3), 146–162 (1954)
Article Google Scholar
Hill, F., Reichart, R., Korhonen, A.: Simlex-999: evaluating semantic models with (genuine) similarity estimation. Comput. Linguist. 41(4), 665–695 (2015)
Article MathSciNet Google Scholar
Jianqiang, Z., Xiaolin, G., Xuejun, Z.: Deep convolution neural networks for Twitter sentiment analysis. IEEE Access. 6, 23253–23260 (2018)
Article Google Scholar
Joulin, A., Grave, E., Bojanowski, P., Douze, M., Jégou, H., Mikolov, T.: Fasttext. zip: Compressing text classification models. arXiv preprint arXiv:1612.03651 (2016)
Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)
Kundi, F.M., Ahmad, S., Khan, A., Asghar, M.Z.: Detection and scoring of internet slangs for sentiment analysis using sentiwordnet. Life Sci. J. 11(9), 66–72 (2014)
Google Scholar
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Peters, M.E., et al.: Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018)
Phan, H.T., Nguyen, N.T., Tran, V.C., Hwang, D.: An approach for a decision-making support system based on measuring the user satisfaction level on Twitter. Inf. Sci. (2021). https://doi.org/10.1016/j.ins.2021.01.008
Article MathSciNet Google Scholar
Phan, H.T., Tran, V.C., Nguyen, N.T., Hwang, D.: Improving the performance of sentiment analysis of tweets containing fuzzy sentiment using the feature ensemble model. IEEE Access. 8, 14630–14641 (2020)
Article Google Scholar
Qi, Y., Sachan, D.S., Felix, M., Padmanabhan, S.J., Neubig, G.: When and why are pre-trained word embeddings useful for neural machine translation? arXiv preprint arXiv:1804.06323 (2018)
Rezaeinia, S.M., Rahmani, R., Ghodsi, A., Veisi, H.: Sentiment analysis based on improved pre-trained word embeddings. Expert Syst. Appl. 117, 139–147 (2019)
Article Google Scholar
Řezanková, H.: Different approaches to the silhouette coefficient calculation in cluster evaluation. In: 21st International Scientific Conference AMSE Applications of Mathematics and Statistics in Economics 2018, pp. 1–10 (2018)
Google Scholar
Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Commun. ACM. 8(10), 627–633 (1965)
Article Google Scholar
Sedgwick, P.: Spearman’s rank correlation coefficient. Bmj. 349 (2014)
Google Scholar
Svoboda, L., Brychcın, T.: Improving word meaning representations using wikipedia categories. Neural Netw. World. 523, 534 (2018)
Google Scholar
Svoboda, L., Brychcín, T.: Enriching word embeddings with global information and testing on highly inflected language. Computación y Sistemas. 23(3) (2019)
Google Scholar
Ulčar, M., Robnik-Šikonja, M.: High quality ELMo embeddings for seven less-resourced languages. arXiv preprint arXiv:1911.10049 (2019)
Wang, B., Wang, A., Chen, F., Wang, Y., Kuo, C.C.J.: Evaluating word embedding models: methods and experimental results. APSIPA Trans. Signal Inf. Process. 8 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Yeungnam University, Gyeongsan, South Korea
Huyen Trang Phan, Javokhir Musaev & Dosam Hwang
Department of Applied Informatics, Wroclaw University of Science and Technology, Wroclaw, Poland
Ngoc Thanh Nguyen

Authors

Huyen Trang Phan
View author publications
You can also search for this author in PubMed Google Scholar
Ngoc Thanh Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Javokhir Musaev
View author publications
You can also search for this author in PubMed Google Scholar
Dosam Hwang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dosam Hwang .

Editor information

Editors and Affiliations

AGH University of Science and Technology, Krakow, Poland
Maciej Paszynski
Ludwig-Maximilians-Universität München, Munich, Germany
Dieter Kranzlmüller
University of Amsterdam, Amsterdam, The Netherlands
Valeria V. Krzhizhanovskaya
University of Tennessee at Knoxville, Knoxville, TN, USA
Jack J. Dongarra
University of Amsterdam, Amsterdam, The Netherlands
Peter M.A. Sloot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Phan, H.T., Nguyen, N.T., Musaev, J., Hwang, D. (2021). A Method for Improving Word Representation Using Synonym Information. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science(), vol 12744. Springer, Cham. https://doi.org/10.1007/978-3-030-77967-2_28

Download citation

DOI: https://doi.org/10.1007/978-3-030-77967-2_28
Published: 09 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77966-5
Online ISBN: 978-3-030-77967-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics