Skip to main content

Text Censoring System for Filtering Malicious Content Using Approximate String Matching and Bayesian Filtering

  • Conference paper
Computational Intelligence in Information Systems

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 331))

Abstract

Information obtained nowadays often contains malicious contents. These malicious contents such as profane words have to be censored as they can influence the minds of the young ones and create hate among people. In censoring the profane words, this paper introduces a hybrid text censoring method which is based on Bayesian Filtering and Approximate String Matching techniques. The Bayesian filtering technique is used to detect the malicious contents (profane words) while the Approximate String Matching technique is used to enhance the effectiveness of detecting profane words. In evaluating the performance of the proposed system, the evaluation metrics of Precision, Recall, F-measure and MAE were used. The results show that Bayesian filtering technique can be used to filter profane words.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Wang, N., Jiang, K., Meier, R., Zeng, H.: Information Filtering against Information Pollution and Crime. In: 2012 International Conference on Computing, Measurement, Control and Sensor Network (CMCSN), pp. 45–47. IEEE (July 2012)

    Google Scholar 

  2. Belkin, N.J., Croft, W.B.: Information filtering and information retrieval: two sides of the same coin? Communications of the ACM 35(12), 29–38 (1992)

    Article  Google Scholar 

  3. Polpinij, J., Sibunruang, C., Paungpronpitag, S., Chamchong, R., Chotthanom, A.: A web pornography patrol system by content-based analysis: In particular text and image. In: IEEE International Conference on Systems, Man and Cybernetics, SMC 2008, pp. 500–505. IEEE (October 2008)

    Google Scholar 

  4. Du, J., Yi, Z.A.: A New Knn Categorization Algorithm for Harmful Information Filtering. In: 2012 Fifth International Symposium on Computational Intelligence and Design (ISCID), vol. 1, pp. 489–492. IEEE (October 2012)

    Google Scholar 

  5. Yoon, T., Park, S.Y., Cho, H.G.: A smart filtering system for newly coined profanities by using approximate string alignment. In: 2010 IEEE 10th International Conference on Computer and Information Technology (CIT), pp. 643–650. IEEE (June 2010)

    Google Scholar 

  6. Christen, P.: Data matching: concepts and techniques for record linkage, entity resolution, and duplicate detection. Springer (2012)

    Google Scholar 

  7. Clayton, R.: Calculating Similarity (Part 2): Jaccard, Sørensen and Jaro-Winkler Similarity. Gettingcirrius.com (2011), http://www.gettingcirrius.com/2011/01/calculating-similarity-part-2-jaccard.html (retrieved February 21, 2014)

  8. Graham, P.: A plan for spam. Available from World Wide Web (2002), http://www.paulgraham.com/spam.html

  9. Anderson, D.: Statistical Spam Filtering, EECS595 (Fall 2006)

    Google Scholar 

  10. Bayesian Filtering Example (n.d), http://www.process.com/precisemail/bayesian_example.htm (retrieved)

  11. Swear Word List (n.d). NoSwearing Website, http://www.noswearing.com/dictionary (retrieved)

  12. The 1200 Most Frequently Used Words in the English Language (n.d), Utah State Office of Education (n.d), http://www.schools.utah.gov/CURR/langartelem/Core-Standards/Resources.aspx (retrieved April 3, 2014)

  13. Alkahtani, H.S., Gardner-Stephen, P.A.U.L., Goodwin, R.: A taxonomy of email SPAM filters. In: Proc. the 12th International Arab Conference on Information Technology (ACIT), pp. 351–356 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Khairil Imran Ghauth .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Ghauth, K.I., Sukhur, M.S. (2015). Text Censoring System for Filtering Malicious Content Using Approximate String Matching and Bayesian Filtering. In: Phon-Amnuaisuk, S., Au, T. (eds) Computational Intelligence in Information Systems. Advances in Intelligent Systems and Computing, vol 331. Springer, Cham. https://doi.org/10.1007/978-3-319-13153-5_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-13153-5_15

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-13152-8

  • Online ISBN: 978-3-319-13153-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics