Text Censoring System for Filtering Malicious Content Using Approximate String Matching and Bayesian Filtering

Ghauth, Khairil Imran; Sukhur, Muhammad Shurazi

doi:10.1007/978-3-319-13153-5_15

Khairil Imran Ghauth⁴ &
Muhammad Shurazi Sukhur⁴

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 331))

1329 Accesses
2 Citations

Abstract

Information obtained nowadays often contains malicious contents. These malicious contents such as profane words have to be censored as they can influence the minds of the young ones and create hate among people. In censoring the profane words, this paper introduces a hybrid text censoring method which is based on Bayesian Filtering and Approximate String Matching techniques. The Bayesian filtering technique is used to detect the malicious contents (profane words) while the Approximate String Matching technique is used to enhance the effectiveness of detecting profane words. In evaluating the performance of the proposed system, the evaluation metrics of Precision, Recall, F-measure and MAE were used. The results show that Bayesian filtering technique can be used to filter profane words.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Wang, N., Jiang, K., Meier, R., Zeng, H.: Information Filtering against Information Pollution and Crime. In: 2012 International Conference on Computing, Measurement, Control and Sensor Network (CMCSN), pp. 45–47. IEEE (July 2012)
Google Scholar
Belkin, N.J., Croft, W.B.: Information filtering and information retrieval: two sides of the same coin? Communications of the ACM 35(12), 29–38 (1992)
Article Google Scholar
Polpinij, J., Sibunruang, C., Paungpronpitag, S., Chamchong, R., Chotthanom, A.: A web pornography patrol system by content-based analysis: In particular text and image. In: IEEE International Conference on Systems, Man and Cybernetics, SMC 2008, pp. 500–505. IEEE (October 2008)
Google Scholar
Du, J., Yi, Z.A.: A New Knn Categorization Algorithm for Harmful Information Filtering. In: 2012 Fifth International Symposium on Computational Intelligence and Design (ISCID), vol. 1, pp. 489–492. IEEE (October 2012)
Google Scholar
Yoon, T., Park, S.Y., Cho, H.G.: A smart filtering system for newly coined profanities by using approximate string alignment. In: 2010 IEEE 10th International Conference on Computer and Information Technology (CIT), pp. 643–650. IEEE (June 2010)
Google Scholar
Christen, P.: Data matching: concepts and techniques for record linkage, entity resolution, and duplicate detection. Springer (2012)
Google Scholar
Clayton, R.: Calculating Similarity (Part 2): Jaccard, Sørensen and Jaro-Winkler Similarity. Gettingcirrius.com (2011), http://www.gettingcirrius.com/2011/01/calculating-similarity-part-2-jaccard.html (retrieved February 21, 2014)
Graham, P.: A plan for spam. Available from World Wide Web (2002), http://www.paulgraham.com/spam.html
Anderson, D.: Statistical Spam Filtering, EECS595 (Fall 2006)
Google Scholar
Bayesian Filtering Example (n.d), http://www.process.com/precisemail/bayesian_example.htm (retrieved)
Swear Word List (n.d). NoSwearing Website, http://www.noswearing.com/dictionary (retrieved)
The 1200 Most Frequently Used Words in the English Language (n.d), Utah State Office of Education (n.d), http://www.schools.utah.gov/CURR/langartelem/Core-Standards/Resources.aspx (retrieved April 3, 2014)
Alkahtani, H.S., Gardner-Stephen, P.A.U.L., Goodwin, R.: A taxonomy of email SPAM filters. In: Proc. the 12th International Arab Conference on Information Technology (ACIT), pp. 351–356 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computing and Informatics, Multimedia University, Cyberjaya, Malaysia
Khairil Imran Ghauth & Muhammad Shurazi Sukhur

Authors

Khairil Imran Ghauth
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Shurazi Sukhur
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Khairil Imran Ghauth .

Editor information

Editors and Affiliations

Jalan Tungku Link, Institut Teknologi Brunei, Gadong, Brunei Darussalam
Somnuk Phon-Amnuaisuk
Jalan Tungku Link, Institut Teknologi Brunei, Gadong, Brunei Darussalam
Thien Wan Au

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ghauth, K.I., Sukhur, M.S. (2015). Text Censoring System for Filtering Malicious Content Using Approximate String Matching and Bayesian Filtering. In: Phon-Amnuaisuk, S., Au, T. (eds) Computational Intelligence in Information Systems. Advances in Intelligent Systems and Computing, vol 331. Springer, Cham. https://doi.org/10.1007/978-3-319-13153-5_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-13153-5_15
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13152-8
Online ISBN: 978-3-319-13153-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics