Skip to main content

Machine Learning Implementations in Arabic Text Classification

  • Chapter
  • First Online:
Intelligent Natural Language Processing: Trends and Applications

Part of the book series: Studies in Computational Intelligence ((SCI,volume 740))

Abstract

Text categorization denotes the process of assigning to a piece of text a label that describes its thematic information. Although this task has been extensively investigated for different languages, it has not been researched thoroughly with respect to the Arabic language. In this chapter, we summarize the major techniques used for addressing different aspects of the text classification problem. These aspects include problem formalization using vector space model, term weighting, feature reduction, and classification algorithms. We pay special attention to the part of research devoted to text categorization in the Arabic language. We conclude that the effect of language is minimized with respect to this task. Moreover, we list the currently unsolved issues in the text classification context and thereby highlight the active research directions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 299.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    BOOSTEXTER is available from http://www.cs.princeton.edu/˜schapire/boostexter.html.

References

  1. Aas, K., Eikvil, L.: Text categorisation: a survey (1999)

    Google Scholar 

  2. Abbas, M., Smaïli, K., Berkani, D.: Evaluation of topic identification methods on Arabic corpora. JDIM 9(5), 185–192 (2011)

    Google Scholar 

  3. Al-Anzi, F.S., AbuZeina, D.: Big data categorization for arabic text using latent semantic indexing and clustering. In: International Conference on Engineering Technologies and Big Data Analytics (ETBDA 2016), pp. 1–4 (2016)

    Google Scholar 

  4. Al-Kabi, M.N., Ata, B.M.A., Wahsheh, H.A., Alsmadi, I.M.: A topical classification of Quranic Arabic text. In: Proceedings of the Taibah University International Conference on Advances in Information Technology for the Holy Quran and its Sciences, pp. 22–25, Dec 2013

    Google Scholar 

  5. Al-Radaideh, Q.A., Al-Shawakfa, E.M., Ghareb, A.S., Abu-Salem, H.: An approach for Arabic text categorization using association rule mining. Int. J. Comput. Process. Lang. 23(01), 81–106 (2011)

    Article  Google Scholar 

  6. Al-Thubaity, A., Abanumay, N., Al-Jerayyed, S., Alrukban, A., Mannaa, Z.: The effect of combining different feature selection methods on arabic text classification. In: 14th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), 2013, pp. 211–216. IEEE, July 2013

    Google Scholar 

  7. Bali, M., Gore, D.: A survey on text classification with different types of classification methods. Int. J. Innov. Res. Comput. Commun. Eng. 3, 4888–4894 (2015)

    Google Scholar 

  8. Bekkerman, R., El-Yaniv, R., Tishby, N., Winter, Y.: On feature distributional clustering for text categorization. In: Croft, W.B., Harper, D.J., Kraft, D.H., Zobel, J. (eds.) Proceedings of SIGIR-01, 24th ACM International Conference on Research and Development in Information Retrieval, pp. 146–153. ACM Press, New Orleans, New York, US (2001)

    Google Scholar 

  9. Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)

    Article  Google Scholar 

  10. Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 113–120. ACM, June 2006

    Google Scholar 

  11. Buckwalter, T.: Issues in Arabic orthography and morphology analysis. In: Farghaly, A., Megerdoomian, K. (eds.) Proceedings of the Workshop on Computational Approaches to Arabic Script-Based Languages, Association for Computational Linguistics, COLING 2004, Geneva, Switzerland 2004, pp. 31–34, Aug 2004

    Google Scholar 

  12. Caropreso, M.F., Matwin, S., Sebastiani, F.: A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization. In: Text Databases and Document Management: Theory and Practice, vol. 5478, pp. 78–102 (2001)‏

    Google Scholar 

  13. Chen, M., Jin, X., Shen, D.: Short text classification improved by learning multi-granularity topics. In: IJCAI pp. 1776–1781, July 2011

    Google Scholar 

  14. Deerwester, S., Dumais, S., Landauer, T., Furnas, G., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inform. Sci. 41(6), 391–407 (1990)

    Article  Google Scholar 

  15. Duwairi, R.M.: Machine learning for Arabic text categorization. J. Am. Soc. Inform. Sci. Technol. 57(8), 1005–1010 (2006)

    Article  Google Scholar 

  16. Duwairi, R.M.: Arabic text categorization. Int. Arab J. Inf. Technol. 4(2), 125–132 (2007)

    Google Scholar 

  17. Elhassan, R., Ahmed, M.: Arabic text classification on full word. Int. J. Comput. Sci. Softw. Eng. (IJCSSE) 4(5), 114–120 (2015)

    Google Scholar 

  18. Faidi, K., Ayed, R., Bounhas, I., Elayeb, B.: Comparing Arabic NLP tools for Hadith classification. In: Proceedings of the 2nd International Conference on Islamic Applications in Computer Science and Technologies (IMAN’14) (2014)

    Google Scholar 

  19. Farghaly, A.: Statistical and Symbolic Paradigms in Arabic Computational Linguistics, in Arabic Language and Linguistics, pp 31–60. Georgetown University Press (2012)

    Google Scholar 

  20. Farghaly, A., Shaalan, K.: Arabic natural language processing: challenges and solutions. ACM Trans. Asian Lang. Inf. Process. (TALIP) 8(4), 14 (2009)

    Google Scholar 

  21. Gharib, T.F., Habib, M.B., Fayed, Z.T.: Arabic text classification using support vector machines. IJ Comput. Appl. 16(4), 192–199 (2009)

    Google Scholar 

  22. HaCohen-Kerner, Y., Boger, Z., Beck, H., Yehudai, E.: Classifying documents’ authors to their ethnic group using stems. In: CAINE, pp. 5–11 (2007)

    Google Scholar 

  23. Haralambous, Y., Elidrissi, Y., Lenca, P.: Arabic language text classification using dependency syntax-based feature selection (2014). arXiv:1410.4863

  24. Harrag, F., Al-Salman, A.M.S., BenMohammed, M.: A comparative study of neural networks architectures on Arabic text categorization using feature extraction. In: 2010 International Conference on Machine and Web Intelligence (ICMWI), pp. 102–107. IEEE (2010)

    Google Scholar 

  25. Hijazi, M.M., Zeki, A.M., Ismail, A.R.: Arabic text classification: review study. J. Eng. Appl. Sci. 11(3), 528–536 (2016)

    Google Scholar 

  26. Hmeidi, I., Hawashin, B., El-Qawasmeh, E.: Performance of KNN and SVM classifiers on full word Arabic articles. Adv. Eng. Inf. 22(1), 106–111 ‏ (2008)‏

    Google Scholar 

  27. Hmeidi, I., Al-Ayyoub, M., Abdulla, N.A., Almodawar, A.A., Abooraig, R., Mahyoub, N.A.: Automatic Arabic text categorization: a comprehensive comparative study. J. Inf. Sci. 41(1), 114–124 (2015)

    Article  Google Scholar 

  28. Ikonomakis, M., Kotsiantis, S., Tampakas, V.: Text classification using machine learning techniques. WSEAS Trans. Comput. 4(8), 966–974 (2005)

    Google Scholar 

  29. Joachims, T.: Text categorization with support vector machines: learning with many relevant features. Mach. Learn. ECML-98, 137–142 (1998)

    Google Scholar 

  30. Joachims, T.: Transductive inference for text classification using support vector machines. In: ICML, vol. 99, pp. 200–209‏, June 1999

    Google Scholar 

  31. Kanan, T., Fox, E.A.: Automated arabic text classification with P‐Stemmer, machine learning, and a tailored news article taxonomy. J. Assoc. Inf. Sci. Technol. (‏2016)

    Google Scholar 

  32. Kechaou, Z., Kanoun, S.: A new-Arabic-text classification system using a hidden Markov model. Int. J. Knowl. Based Intell. Eng. Syst. 18(4), 201–210 (2014)

    Article  Google Scholar 

  33. Khorsheed, M.S., Al-Thubaity, A.O.: Comparative evaluation of text classification techniques using a large diverse Arabic dataset. Lang. Resour. Eval. 47(2), 513–538 (2013)

    Article  Google Scholar 

  34. Lee, K., Palsetia, D., Narayanan, R., Patwary, M.M.A., Agrawal, A., Choudhary, A.: Twitter trending topic classification. In: IEEE 11th International Conference on Data Mining Workshops (ICDMW), pp. 251–258. IEEE, Dec 2011

    Google Scholar 

  35. Lewis, D.D., Ringuette, M.: A comparison of two learning algorithms for text categorization. In: Third Annual Symposium on Document Analysis and Information Retrieval, vol. 33 (1994)

    Google Scholar 

  36. Mcauliffe, J.D., Blei, D.M.: Supervised topic models. In: Advances in Neural Information Processing Systems, pp. 121–128 (2008)

    Google Scholar 

  37. Moh’d Mesleh, A., Kanaan, G.: Support vector machine text classification system: using ant colony optimization based feature subset selection. In: International Conference on Computer Engineering and Systems ICCES, pp. 143–148. IEEE‏, Nov 2008

    Google Scholar 

  38. Nalini, K., Sheela, L.J.: Survey on text classification. Int. J. Innov. Res. Adv. Eng. 1(6), 412–417 (2014)

    Google Scholar 

  39. Odeh, A., Abu-Errub, A., Shambour, Q., Turab, N.: Arabic text categorization algorithm using vector evaluation method (2015). arXiv:1501.01318‏‏

  40. Patra, A., Singh, D.: A survey report on text classification with different term weighing methods and comparison between classification algorithms. Int. J. Comput. Appl. 75(7) ‏ (2013)‏

    Google Scholar 

  41. Qiang, W., XiaoLong, W., Yi, G.: A study of semi-discrete matrix decomposition for LSI in automated text categorization. In: International Conference on Natural Language Processing, pp. 606–615. Springer, Berlin‏, March 2004

    Google Scholar 

  42. Qiu, X., Huang, X., Liu, Z., Zhou, J.: Hierarchical text classification with latent concepts. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, short papers-vol. 2, pp. 598–602. Association for Computational Linguistics‏, June 2011

    Google Scholar 

  43. Raho, G., Kanaan, G., Al-Shalabi, R.: Different classification algorithms based on Arabic text classification: feature selection comparative study. Int. J. Adv. Comput. Sci. Appl. 1(6), 192–195

    Google Scholar 

  44. Saad, M.K.: The impact of text preprocessing and term weighting on arabic text classification, Doctoral dissertation, The Islamic University-Gaza (‏2010)

    Google Scholar 

  45. Schapire, R.E., Singer, Y., Singhal, A.: Boosting and Rocchio applied to text filtering. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 215–223. ACM (‏1998)

    Google Scholar 

  46. Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)

    Article  Google Scholar 

  47. Sebastiani, F.: Text categorization. In: Encyclopedia of Database Technologies and Applications, pp. 683–687. IGI Global (2005)

    Google Scholar 

  48. Sebastiani, F., Sperduti, A., Valdambrini, N.: An improved boosting algorithm and its application to text categorization. In: Proceedings of the Ninth International Conference on Information and Knowledge Management, pp. 78–85. ACM (2000)

    Google Scholar 

  49. Wallach, H.M.: Topic modeling: beyond bag-of-words. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 977–984. ACM‏, June 2006

    Google Scholar 

  50. Wiener, E.D., Pedersen, J.O., Weigend, A.S.: A neural network approach to topic spotting. In: Proceedings of SDAIR-95, 4th Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, US, pp. 317–332 (1995)

    Google Scholar 

  51. Wilbur, W.J., Sirotkin, K.: The automatic identification of stop words. J. Inf. Sci. 18(1), 45–55 (1992)

    Article  Google Scholar 

  52. Yang, Y., Pedersen, J.O., A comparative study on feature selection in text categorization. In: Fisher, D.H. (ed.) Proceedings of ICML-97, 14th International Conference on Machine Learning, Organ Kaufmann Publishers, San Francisco, Nashville, US, pp. 412–420 (1997)

    Google Scholar 

  53. Zahran, B.M., Kanaan, G.: Text Feature Selection using Particle Swarm Optimization Algorithm 1 (‏ (2009

    Google Scholar 

  54. Zheng, Z., Wu, X., Srihari, R.: Feature selection for text categorization on imbalanced data. ACM SIGKDD Explor. Newsl. 6(1), 80–89 (2004)

    Article  Google Scholar 

  55. Zhou, S., Li, K., Liu, Y. Text categorization based on topic model. Int. J. Comput. Intell. Syst. 2(4), 398-409 (2009)

    Google Scholar 

  56. Zobel, J., Moffat, A.: Exploring the similarity space. ACM SIGIR Forum, vol. 32, no. 1, pp. 18–34. ACM (‏1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ali Farghaly .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Elarnaoty, M., Farghaly, A. (2018). Machine Learning Implementations in Arabic Text Classification. In: Shaalan, K., Hassanien, A., Tolba, F. (eds) Intelligent Natural Language Processing: Trends and Applications. Studies in Computational Intelligence, vol 740. Springer, Cham. https://doi.org/10.1007/978-3-319-67056-0_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67056-0_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67055-3

  • Online ISBN: 978-3-319-67056-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics