Building Large Arabic Multi-domain Resources for Sentiment Analysis

ElSahar, Hady; El-Beltagy, Samhaa R.

doi:10.1007/978-3-319-18117-2_2

Hady ElSahar¹⁴ &
Samhaa R. El-Beltagy¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9042))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

3548 Accesses
57 Citations

Abstract

While there has been a recent progress in the area of Arabic Sentiment Analysis, most of the resources in this area are either of limited size, domain specific or not publicly available. In this paper, we address this problem by generating large multi-domain datasets for Sentiment Analysis in Arabic. The datasets were scrapped from different reviewing websites and consist of a total of 33K annotated reviews for movies, hotels, restaurants and products. Moreover we build multi-domain lexicons from the generated datasets. Different experiments have been carried out to validate the usefulness of the datasets and the generated lexicons for the task of sentiment classification. From the experimental results, we highlight some useful insights addressing: the best performing classifiers and feature representation methods, the effect of introducing lexicon based features and factors affecting the accuracy of sentiment classification in general. All the datasets, experiments code and results have been made publicly available for scientific purposes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abdul-Mageed, M., Diab, M.: AWATIF: A Multi-Genre Corpus for Modern Standard Arabic Subjectivity and Sentiment Analysis. In: LREC, pp. 3907–3914 (2012)
Google Scholar
Abdul-mageed, M., Diab, M.: SANA: A Large Scale Multi-Genre, Multi-Dialect Lexicon for Arabic Subjectivity and Sentiment Analysis. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), pp. 1162–1169 (2014)
Google Scholar
Abdul-Mageed, M., Diab, M.: Toward Building a Large-Scale Arabic Sentiment Lexicon. In: Proceedings of the 6th International Global WordNet Conference, pp. 18–22 (2012)
Google Scholar
Aly, M., Atiya, A.: LABR: A Large Scale Arabic Book Reviews Dataset, pp. 494–498. Aclweb.Org. (2013)
Google Scholar
Baccianella, S., et al.: SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010), pp. 2200–2204 (2010)
Google Scholar
Badaro, G., et al.: A Large Scale Arabic Sentiment Lexicon for Arabic Opinion Mining. In: ANLP 2014, pp. 176–184 (2014)
Google Scholar
El-Beltagy, S., Ali, A.: Open Issues in the Sentiment Analysis of Arabic Social Media: A Case Study. In: Proceedings of 9th International Conference on Innovations in Information Technology (IIT), pp. 215–220 (2013)
Google Scholar
ElSahar, H., El-Beltagy, S.R.: A Fully Automated Approach for Arabic Slang Lexicon Extraction from Microblogs. In: Gelbukh, A. (ed.) CICLing 2014, Part I. LNCS, vol. 8403, pp. 79–91. Springer, Heidelberg (2014)
Chapter Google Scholar
Jerry, B., Osgood, C.: The pollyanna hypothesis. J. Verbal Learning Verbal Behav. 8(1), 1–8 (1969)
Article Google Scholar
Maamouri, M., et al.: The penn arabic treebank: Building a large-scale annotated arabic corpus. In: NEMLAR Conference on Arabic Language Resources and Tools, pp. 102–109 (2004)
Google Scholar
Martineau, J., et al.: Delta TFIDF: An Improved Feature Space for Sentiment Analysis. In: Proc. Second Int. Conf. Weblogs Soc. Media (ICWSM), vol. 29, pp. 490–497 (2008)
Google Scholar
Nabil, M., et al.: LABR: A Large Scale Arabic Book Reviews Dataset. arXiv Prepr. arXiv1411.6718 (2014)
Google Scholar
Ng, A.: Feature selection, L 1 vs. L 2 regularization, and rotational invariance. In: ICML (2004)
Google Scholar
Pang, B., et al.: Thumbs up? Sentiment Classification using Machine Learning Techniques. In: Conf. Empir. Methods Nat. Lang. Process. (EMNLP 2002), pp. 79–86 (2002)
Google Scholar
Pang, B., Lee, L.: Opinion Mining and Sentiment Analysis. In: The 42nd annual meeting on Association for Computational Linguistics, pp. 271–278 (2004)
Google Scholar
Pang, B., Lee, L.: Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales 1 (2005)
Google Scholar
Pang, B., Lee, L.: Thumbs up? Sentiment classification using machine learning techniques. In: Proc. Conf. Empir. Methods Nat. Lang. Process., Philadephia, Pennsylvania, USA, July 6-7, pp. 79–86 (2002)
Google Scholar
Rushdi-Saleh, M., Martin-Valdivia, T.: OCA: Opinion corpus for Arabic. J. Am. Soc. Inf. Sci. Technol. 62(10), 2045–2054 (2011)
Article Google Scholar
Taboada, M., et al.: Lexicon-Based Methods for Sentiment Analysis (2011)
Google Scholar
Turney, P.D.: Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 417–424. Association for Computational Linguistics (2002)
Google Scholar
Zhu, J., et al.: 1 -norm Support Vector Machines. Advances in Neural Information Processing Systems 16(1), 49–56 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Center of Informatics Sciences, Nile University, Cairo, Egypt
Hady ElSahar & Samhaa R. El-Beltagy

Authors

Hady ElSahar
View author publications
You can also search for this author in PubMed Google Scholar
Samhaa R. El-Beltagy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hady ElSahar .

Editor information

Editors and Affiliations

Centro de Investigación en Computación, Instituto Politécnico Nacional, Mexico DF, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

ElSahar, H., El-Beltagy, S.R. (2015). Building Large Arabic Multi-domain Resources for Sentiment Analysis. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2015. Lecture Notes in Computer Science(), vol 9042. Springer, Cham. https://doi.org/10.1007/978-3-319-18117-2_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-18117-2_2
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18116-5
Online ISBN: 978-3-319-18117-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics