Skip to main content

Building Large Arabic Multi-domain Resources for Sentiment Analysis

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9042))

Abstract

While there has been a recent progress in the area of Arabic Sentiment Analysis, most of the resources in this area are either of limited size, domain specific or not publicly available. In this paper, we address this problem by generating large multi-domain datasets for Sentiment Analysis in Arabic. The datasets were scrapped from different reviewing websites and consist of a total of 33K annotated reviews for movies, hotels, restaurants and products. Moreover we build multi-domain lexicons from the generated datasets. Different experiments have been carried out to validate the usefulness of the datasets and the generated lexicons for the task of sentiment classification. From the experimental results, we highlight some useful insights addressing: the best performing classifiers and feature representation methods, the effect of introducing lexicon based features and factors affecting the accuracy of sentiment classification in general. All the datasets, experiments code and results have been made publicly available for scientific purposes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abdul-Mageed, M., Diab, M.: AWATIF: A Multi-Genre Corpus for Modern Standard Arabic Subjectivity and Sentiment Analysis. In: LREC, pp. 3907–3914 (2012)

    Google Scholar 

  2. Abdul-mageed, M., Diab, M.: SANA: A Large Scale Multi-Genre, Multi-Dialect Lexicon for Arabic Subjectivity and Sentiment Analysis. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), pp. 1162–1169 (2014)

    Google Scholar 

  3. Abdul-Mageed, M., Diab, M.: Toward Building a Large-Scale Arabic Sentiment Lexicon. In: Proceedings of the 6th International Global WordNet Conference, pp. 18–22 (2012)

    Google Scholar 

  4. Aly, M., Atiya, A.: LABR: A Large Scale Arabic Book Reviews Dataset, pp. 494–498. Aclweb.Org. (2013)

    Google Scholar 

  5. Baccianella, S., et al.: SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010), pp. 2200–2204 (2010)

    Google Scholar 

  6. Badaro, G., et al.: A Large Scale Arabic Sentiment Lexicon for Arabic Opinion Mining. In: ANLP 2014, pp. 176–184 (2014)

    Google Scholar 

  7. El-Beltagy, S., Ali, A.: Open Issues in the Sentiment Analysis of Arabic Social Media: A Case Study. In: Proceedings of 9th International Conference on Innovations in Information Technology (IIT), pp. 215–220 (2013)

    Google Scholar 

  8. ElSahar, H., El-Beltagy, S.R.: A Fully Automated Approach for Arabic Slang Lexicon Extraction from Microblogs. In: Gelbukh, A. (ed.) CICLing 2014, Part I. LNCS, vol. 8403, pp. 79–91. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  9. Jerry, B., Osgood, C.: The pollyanna hypothesis. J. Verbal Learning Verbal Behav. 8(1), 1–8 (1969)

    Article  Google Scholar 

  10. Maamouri, M., et al.: The penn arabic treebank: Building a large-scale annotated arabic corpus. In: NEMLAR Conference on Arabic Language Resources and Tools, pp. 102–109 (2004)

    Google Scholar 

  11. Martineau, J., et al.: Delta TFIDF: An Improved Feature Space for Sentiment Analysis. In: Proc. Second Int. Conf. Weblogs Soc. Media (ICWSM), vol. 29, pp. 490–497 (2008)

    Google Scholar 

  12. Nabil, M., et al.: LABR: A Large Scale Arabic Book Reviews Dataset. arXiv Prepr. arXiv1411.6718 (2014)

    Google Scholar 

  13. Ng, A.: Feature selection, L 1 vs. L 2 regularization, and rotational invariance. In: ICML (2004)

    Google Scholar 

  14. Pang, B., et al.: Thumbs up? Sentiment Classification using Machine Learning Techniques. In: Conf. Empir. Methods Nat. Lang. Process. (EMNLP 2002), pp. 79–86 (2002)

    Google Scholar 

  15. Pang, B., Lee, L.: Opinion Mining and Sentiment Analysis. In: The 42nd annual meeting on Association for Computational Linguistics, pp. 271–278 (2004)

    Google Scholar 

  16. Pang, B., Lee, L.: Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales 1 (2005)

    Google Scholar 

  17. Pang, B., Lee, L.: Thumbs up? Sentiment classification using machine learning techniques. In: Proc. Conf. Empir. Methods Nat. Lang. Process., Philadephia, Pennsylvania, USA, July 6-7, pp. 79–86 (2002)

    Google Scholar 

  18. Rushdi-Saleh, M., Martin-Valdivia, T.: OCA: Opinion corpus for Arabic. J. Am. Soc. Inf. Sci. Technol. 62(10), 2045–2054 (2011)

    Article  Google Scholar 

  19. Taboada, M., et al.: Lexicon-Based Methods for Sentiment Analysis (2011)

    Google Scholar 

  20. Turney, P.D.: Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 417–424. Association for Computational Linguistics (2002)

    Google Scholar 

  21. Zhu, J., et al.: 1 -norm Support Vector Machines. Advances in Neural Information Processing Systems 16(1), 49–56 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hady ElSahar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

ElSahar, H., El-Beltagy, S.R. (2015). Building Large Arabic Multi-domain Resources for Sentiment Analysis. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2015. Lecture Notes in Computer Science(), vol 9042. Springer, Cham. https://doi.org/10.1007/978-3-319-18117-2_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18117-2_2

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18116-5

  • Online ISBN: 978-3-319-18117-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics