Abstract
Text categorization denotes the process of assigning to a piece of text a label that describes its thematic information. Although this task has been extensively investigated for different languages, it has not been researched thoroughly with respect to the Arabic language. In this chapter, we summarize the major techniques used for addressing different aspects of the text classification problem. These aspects include problem formalization using vector space model, term weighting, feature reduction, and classification algorithms. We pay special attention to the part of research devoted to text categorization in the Arabic language. We conclude that the effect of language is minimized with respect to this task. Moreover, we list the currently unsolved issues in the text classification context and thereby highlight the active research directions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
BOOSTEXTER is available from http://www.cs.princeton.edu/˜schapire/boostexter.html.
References
Aas, K., Eikvil, L.: Text categorisation: a survey (1999)
Abbas, M., Smaïli, K., Berkani, D.: Evaluation of topic identification methods on Arabic corpora. JDIM 9(5), 185–192 (2011)
Al-Anzi, F.S., AbuZeina, D.: Big data categorization for arabic text using latent semantic indexing and clustering. In: International Conference on Engineering Technologies and Big Data Analytics (ETBDA 2016), pp. 1–4 (2016)
Al-Kabi, M.N., Ata, B.M.A., Wahsheh, H.A., Alsmadi, I.M.: A topical classification of Quranic Arabic text. In: Proceedings of the Taibah University International Conference on Advances in Information Technology for the Holy Quran and its Sciences, pp. 22–25, Dec 2013
Al-Radaideh, Q.A., Al-Shawakfa, E.M., Ghareb, A.S., Abu-Salem, H.: An approach for Arabic text categorization using association rule mining. Int. J. Comput. Process. Lang. 23(01), 81–106 (2011)
Al-Thubaity, A., Abanumay, N., Al-Jerayyed, S., Alrukban, A., Mannaa, Z.: The effect of combining different feature selection methods on arabic text classification. In: 14th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), 2013, pp. 211–216. IEEE, July 2013
Bali, M., Gore, D.: A survey on text classification with different types of classification methods. Int. J. Innov. Res. Comput. Commun. Eng. 3, 4888–4894 (2015)
Bekkerman, R., El-Yaniv, R., Tishby, N., Winter, Y.: On feature distributional clustering for text categorization. In: Croft, W.B., Harper, D.J., Kraft, D.H., Zobel, J. (eds.) Proceedings of SIGIR-01, 24th ACM International Conference on Research and Development in Information Retrieval, pp. 146–153. ACM Press, New Orleans, New York, US (2001)
Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)
Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 113–120. ACM, June 2006
Buckwalter, T.: Issues in Arabic orthography and morphology analysis. In: Farghaly, A., Megerdoomian, K. (eds.) Proceedings of the Workshop on Computational Approaches to Arabic Script-Based Languages, Association for Computational Linguistics, COLING 2004, Geneva, Switzerland 2004, pp. 31–34, Aug 2004
Caropreso, M.F., Matwin, S., Sebastiani, F.: A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization. In: Text Databases and Document Management: Theory and Practice, vol. 5478, pp. 78–102 (2001)
Chen, M., Jin, X., Shen, D.: Short text classification improved by learning multi-granularity topics. In: IJCAI pp. 1776–1781, July 2011
Deerwester, S., Dumais, S., Landauer, T., Furnas, G., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inform. Sci. 41(6), 391–407 (1990)
Duwairi, R.M.: Machine learning for Arabic text categorization. J. Am. Soc. Inform. Sci. Technol. 57(8), 1005–1010 (2006)
Duwairi, R.M.: Arabic text categorization. Int. Arab J. Inf. Technol. 4(2), 125–132 (2007)
Elhassan, R., Ahmed, M.: Arabic text classification on full word. Int. J. Comput. Sci. Softw. Eng. (IJCSSE) 4(5), 114–120 (2015)
Faidi, K., Ayed, R., Bounhas, I., Elayeb, B.: Comparing Arabic NLP tools for Hadith classification. In: Proceedings of the 2nd International Conference on Islamic Applications in Computer Science and Technologies (IMAN’14) (2014)
Farghaly, A.: Statistical and Symbolic Paradigms in Arabic Computational Linguistics, in Arabic Language and Linguistics, pp 31–60. Georgetown University Press (2012)
Farghaly, A., Shaalan, K.: Arabic natural language processing: challenges and solutions. ACM Trans. Asian Lang. Inf. Process. (TALIP) 8(4), 14 (2009)
Gharib, T.F., Habib, M.B., Fayed, Z.T.: Arabic text classification using support vector machines. IJ Comput. Appl. 16(4), 192–199 (2009)
HaCohen-Kerner, Y., Boger, Z., Beck, H., Yehudai, E.: Classifying documents’ authors to their ethnic group using stems. In: CAINE, pp. 5–11 (2007)
Haralambous, Y., Elidrissi, Y., Lenca, P.: Arabic language text classification using dependency syntax-based feature selection (2014). arXiv:1410.4863
Harrag, F., Al-Salman, A.M.S., BenMohammed, M.: A comparative study of neural networks architectures on Arabic text categorization using feature extraction. In: 2010 International Conference on Machine and Web Intelligence (ICMWI), pp. 102–107. IEEE (2010)
Hijazi, M.M., Zeki, A.M., Ismail, A.R.: Arabic text classification: review study. J. Eng. Appl. Sci. 11(3), 528–536 (2016)
Hmeidi, I., Hawashin, B., El-Qawasmeh, E.: Performance of KNN and SVM classifiers on full word Arabic articles. Adv. Eng. Inf. 22(1), 106–111 (2008)
Hmeidi, I., Al-Ayyoub, M., Abdulla, N.A., Almodawar, A.A., Abooraig, R., Mahyoub, N.A.: Automatic Arabic text categorization: a comprehensive comparative study. J. Inf. Sci. 41(1), 114–124 (2015)
Ikonomakis, M., Kotsiantis, S., Tampakas, V.: Text classification using machine learning techniques. WSEAS Trans. Comput. 4(8), 966–974 (2005)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. Mach. Learn. ECML-98, 137–142 (1998)
Joachims, T.: Transductive inference for text classification using support vector machines. In: ICML, vol. 99, pp. 200–209, June 1999
Kanan, T., Fox, E.A.: Automated arabic text classification with P‐Stemmer, machine learning, and a tailored news article taxonomy. J. Assoc. Inf. Sci. Technol. (2016)
Kechaou, Z., Kanoun, S.: A new-Arabic-text classification system using a hidden Markov model. Int. J. Knowl. Based Intell. Eng. Syst. 18(4), 201–210 (2014)
Khorsheed, M.S., Al-Thubaity, A.O.: Comparative evaluation of text classification techniques using a large diverse Arabic dataset. Lang. Resour. Eval. 47(2), 513–538 (2013)
Lee, K., Palsetia, D., Narayanan, R., Patwary, M.M.A., Agrawal, A., Choudhary, A.: Twitter trending topic classification. In: IEEE 11th International Conference on Data Mining Workshops (ICDMW), pp. 251–258. IEEE, Dec 2011
Lewis, D.D., Ringuette, M.: A comparison of two learning algorithms for text categorization. In: Third Annual Symposium on Document Analysis and Information Retrieval, vol. 33 (1994)
Mcauliffe, J.D., Blei, D.M.: Supervised topic models. In: Advances in Neural Information Processing Systems, pp. 121–128 (2008)
Moh’d Mesleh, A., Kanaan, G.: Support vector machine text classification system: using ant colony optimization based feature subset selection. In: International Conference on Computer Engineering and Systems ICCES, pp. 143–148. IEEE, Nov 2008
Nalini, K., Sheela, L.J.: Survey on text classification. Int. J. Innov. Res. Adv. Eng. 1(6), 412–417 (2014)
Odeh, A., Abu-Errub, A., Shambour, Q., Turab, N.: Arabic text categorization algorithm using vector evaluation method (2015). arXiv:1501.01318
Patra, A., Singh, D.: A survey report on text classification with different term weighing methods and comparison between classification algorithms. Int. J. Comput. Appl. 75(7) (2013)
Qiang, W., XiaoLong, W., Yi, G.: A study of semi-discrete matrix decomposition for LSI in automated text categorization. In: International Conference on Natural Language Processing, pp. 606–615. Springer, Berlin, March 2004
Qiu, X., Huang, X., Liu, Z., Zhou, J.: Hierarchical text classification with latent concepts. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, short papers-vol. 2, pp. 598–602. Association for Computational Linguistics, June 2011
Raho, G., Kanaan, G., Al-Shalabi, R.: Different classification algorithms based on Arabic text classification: feature selection comparative study. Int. J. Adv. Comput. Sci. Appl. 1(6), 192–195
Saad, M.K.: The impact of text preprocessing and term weighting on arabic text classification, Doctoral dissertation, The Islamic University-Gaza (2010)
Schapire, R.E., Singer, Y., Singhal, A.: Boosting and Rocchio applied to text filtering. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 215–223. ACM (1998)
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)
Sebastiani, F.: Text categorization. In: Encyclopedia of Database Technologies and Applications, pp. 683–687. IGI Global (2005)
Sebastiani, F., Sperduti, A., Valdambrini, N.: An improved boosting algorithm and its application to text categorization. In: Proceedings of the Ninth International Conference on Information and Knowledge Management, pp. 78–85. ACM (2000)
Wallach, H.M.: Topic modeling: beyond bag-of-words. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 977–984. ACM, June 2006
Wiener, E.D., Pedersen, J.O., Weigend, A.S.: A neural network approach to topic spotting. In: Proceedings of SDAIR-95, 4th Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, US, pp. 317–332 (1995)
Wilbur, W.J., Sirotkin, K.: The automatic identification of stop words. J. Inf. Sci. 18(1), 45–55 (1992)
Yang, Y., Pedersen, J.O., A comparative study on feature selection in text categorization. In: Fisher, D.H. (ed.) Proceedings of ICML-97, 14th International Conference on Machine Learning, Organ Kaufmann Publishers, San Francisco, Nashville, US, pp. 412–420 (1997)
Zahran, B.M., Kanaan, G.: Text Feature Selection using Particle Swarm Optimization Algorithm 1 ( (2009
Zheng, Z., Wu, X., Srihari, R.: Feature selection for text categorization on imbalanced data. ACM SIGKDD Explor. Newsl. 6(1), 80–89 (2004)
Zhou, S., Li, K., Liu, Y. Text categorization based on topic model. Int. J. Comput. Intell. Syst. 2(4), 398-409 (2009)
Zobel, J., Moffat, A.: Exploring the similarity space. ACM SIGIR Forum, vol. 32, no. 1, pp. 18–34. ACM (1998)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Elarnaoty, M., Farghaly, A. (2018). Machine Learning Implementations in Arabic Text Classification. In: Shaalan, K., Hassanien, A., Tolba, F. (eds) Intelligent Natural Language Processing: Trends and Applications. Studies in Computational Intelligence, vol 740. Springer, Cham. https://doi.org/10.1007/978-3-319-67056-0_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-67056-0_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67055-3
Online ISBN: 978-3-319-67056-0
eBook Packages: EngineeringEngineering (R0)