Skip to main content

Design Consideration of Malay Text Stemmer Using Structured Approach

  • Conference paper
  • First Online:
Smart Trends in Computing and Communications

Abstract

Word stemmer (or text stemmer) is used to remove bound morphemes from derived words so that various morphological variants are mapped into common base forms. It is usually used as one of the preprocessing tools in text classification, text mining, and information retrieval tasks. Therefore, the design of an effective text stemmer is crucial for ensuring text stemming process maps morphological variants into correct base forms. This paper investigates the design consideration of an effective text stemmer from the perspective of the Malay language. These design considerations are based on current challenges faced by previous researchers in performing text stemming against Malay texts. By adopting these considerations, an effective text stemmer is expected to address common stemming errors and also, expected to produce promising stemming accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kietzmann, J.H., Hermkens, K., McCarthy, I.P., Silvestre, B.S.: Social media? get serious! understanding the functional building blocks of social media. Bus. Horiz. 54(3), 241–251 (2011)

    Article  Google Scholar 

  2. Aggarwal, C.C., Zhai, C. (eds.).: Mining Text Data. Springer Science and Business Media (2012)

    Google Scholar 

  3. Alfred, R., Leong, L.C., On, C.K., Anthony, P.: A literature review and discussion of Malay rule-based affix elimination algorithms. In: The 8th International Conference on Knowledge Management in Organizations, pp. 285–297. Springer, Dordrecht (2014)

    Google Scholar 

  4. Singh, J., Gupta, V.: A systematic review of text stemming techniques. Artif. Intell. Rev. 48(2), 157–217 (2017)

    Article  Google Scholar 

  5. Kassim, M.N., Maarof, M.A., Zainal, A., Wahab, A.A.: Word stemming challenges in Malay texts: a literature review. In: 2016 4th International Conference on Information and Communication Technology (ICoICT), pp. 1–6. IEEE (2016)

    Google Scholar 

  6. Othman, A.: Pengakar Perkataan Melayu untuk Sistem Capaian Dokumen, MSc Thesis. Universiti Kebangsaan Malaysia, Bangi (1993)

    Google Scholar 

  7. Ahmad, F., Yusoff, M., Sembok, T.M.T.: Experiments with a stemming algorithm for Malay words. J. Am. Soc. Inform. Sci. 47(12), 909–918 (1996)

    Article  Google Scholar 

  8. Idris, N., Syed, S.M.F.D.: Stemming for term conflation in Malay texts. In: International Conference on Artificial Intelligence (2001)

    Google Scholar 

  9. Sankupellay, M., Valliappan, S.: Malay language stemmer. Sunway Acad. J. 3, 147–153 (2006)

    Google Scholar 

  10. Yasukawa, M., Lim, H.T., Yokoo, H.: Stemming Malay text and its application in automatic text categorization. IEICE Trans. Inform. Syst. 92(12), 2351–2359 (2009)

    Article  Google Scholar 

  11. Abdullah, M.T., Ahmad, F., Mahmod, R., Sembok, T.M.T.: Rules frequency order stemmer for Malay language. IJCSNS Int. J. Comput. Sci. Netw. Secur. 9(2), 433–438 (2009)

    Google Scholar 

  12. Fadzli, S.A., Norsalehen, A.K., Syarilla, I.A., Hasni, H., Dhalila, M.S.S.: Simple rules Malay stemmer. In: The International Conference on Informatics and Applications (ICIA2012), The Society of Digital Information and Wireless Communication, pp. 28–35 (2012)

    Google Scholar 

  13. Leong, L.C., Basri, S., Alfred, R.: Enhancing Malay stemming algorithm with background knowledge. In: PRICAI 2012: Trends in Artificial Intelligence, pp. 753–758. Springer, Heidelberg (2012)

    Google Scholar 

  14. Lee, J., Othman, R.M., Mohamad, N.Z.: Syllable-based Malay word stemmer. In: Computers and Informatics (ISCI), 2013 IEEE Symposium, pp. 7–11. IEEE (2013)

    Google Scholar 

  15. Darwis, S.A., Abdullah, R., Idris, N.: Exhaustive affix stripping and a Malay word register to solve stemming errors and ambiguity problem in Malay stemmers. Malays. J. Comput. Sci. (2012)

    Google Scholar 

  16. Kassim, M.N., Maarof, M.A., Zainal, A., Wahab, A.A.: Enhanced affixation word stemmer with stemming error reducer to solve affixation stemming errors. J. Telecommun. Electron. Comput. Eng. (JTEC) 8(3), 37–41 (2016)

    Google Scholar 

  17. Kassim, M.N., Jali, S.H.M., Maarof, M.A., Zainal, A.: Towards stemming error reduction for Malay texts. In: Computational Science and Technology, pp. 13–23. Springer, Singapore (2019)

    Google Scholar 

  18. Kassim, M.N., Maarof, M.A., Zainal, A., Wahab, A.A.: enhanced rules application order to stem affixation, reduplication and compounding words in Malay texts. In: Pacific Rim Knowledge Acquisition Workshop, pp. 71–85. Springer, Cham (2016)

    Chapter  Google Scholar 

  19. Hassan, A.: Morfologi, vol. 13. PTS Professional (2006)

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank the Editor in Chief and the anonymous reviewers of the manuscript for their valuable comments and suggestions. This research was funded by Universiti Teknologi Malaysia’s Research University Grant (VUP) PY/2017/01736.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamad Nizam Kassim .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kassim, M.N., Jali, S.H.M., Maarof, M.A., Zainal, A., Wahab, A.A. (2020). Design Consideration of Malay Text Stemmer Using Structured Approach. In: Zhang, YD., Mandal, J., So-In, C., Thakur, N. (eds) Smart Trends in Computing and Communications. Smart Innovation, Systems and Technologies, vol 165. Springer, Singapore. https://doi.org/10.1007/978-981-15-0077-0_43

Download citation

Publish with us

Policies and ethics