Skip to main content

Graph Model for Pattern Recognition in Text

  • Chapter
Mining and Analyzing Social Networks

Part of the book series: Studies in Computational Intelligence ((SCI,volume 288))

Abstract

In this paper, we propose a novel approach that uses a weighted directed multigraph for text pattern recognition. Instead of the traditional model which is based on the frequency of keywords for text classification, we set up a weighted directed multigraph model using the distances between the keywords as the weights of arcs. We then developed a keyword-frequencydistance- based algorithm which not only utilizes the frequency information of keywords but also their ordering information. We applied this new idea to the detection of plagiarized papers and the detection of fraudulent emails written by the same person. The results on these case studies show that this new method performs much better than traditional methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Apte, C., Damerau, F., Weiss, S.: Text mining with decision rules and decision trees. In: Workshop on Learning from text and the Web, Conference on Automated Learning and Discovery (1998)

    Google Scholar 

  2. Bestgen, Y.: Improving Text Segmentation Using Latent Semantic Analysis: A Reanalysis of Choi, Wiemer-Hastings, and Moore. Computational Linguistics 32(3), 455 (2006)

    Article  Google Scholar 

  3. Hansen, P., Jaumard, B.: Cluster analysis and mathematical programming. Mathematical Programming, 191–215 (1997)

    Google Scholar 

  4. Hardle, W., Simar, L.: Applied Multivariate Statistical Analysis. Springer, Berlin (2003)

    Google Scholar 

  5. Hassan, S., Mihalcea, R., Banea, C.: Random-Walk Term Weighting for Improved Text Classification. In: Proceedings of the IEEE International Conference on Semantic Computing (ICSC 2007), Irvine, CA (September 2007)

    Google Scholar 

  6. Jackson, P., Moulinier, I.: Natural Language Processing for Online Applications: Text Retrieval, Extraction, and Categorization. John Benjamins Publishing Co., Amsterdam (2002)

    Google Scholar 

  7. Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  8. Lan, M., Tan, C., Low, H., Sungy, S.: A comprehensive comparative study on term weighting schemes for text categorization with support vector machines. In: Proceedings of the 14th international conference on World Wide Web, pp. 1032–1033 (2005)

    Google Scholar 

  9. Landauer, T.K., Foltz, P., Laham, D.: An introduction to latent semantic analysis. Discourse Processes 25, 259–284 (1998)

    Article  Google Scholar 

  10. Lewis, D.D.: Naive (Bayes) at forty: The independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  11. Milligan, G.W.: Cluster analysis. In: Kotz, S. (ed.) Encyclopedia of Statistical Sciences, pp. 120–125. Wiley, New York (1998)

    Google Scholar 

  12. Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)

    MATH  Google Scholar 

  13. Ng, H., Goh, W., Low, K.: Feature selection, perceptron learning, and a usability case study for text categorization. In: Proc. 20th Int. ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR 1997), pp. 67–73 (1997)

    Google Scholar 

  14. Ou, Y., Zhang, C.-Q.: A new multimembership clustering method. Journal of Industrial and Management Optimization 3(4), 619–624 (2007)

    MATH  MathSciNet  Google Scholar 

  15. Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. Research and Development in Information Retrieval, pp. 275–281 (1998)

    Google Scholar 

  16. Robertson, R., Sparck-Jones, K.: Simple, proven approaches to text retrieval. Technical Report (1997)

    Google Scholar 

  17. Rosario, B.: Latent Semantic Indexing: An overview. INFOSYS 240 (Spring 2000)

    Google Scholar 

  18. Ruiz, M.E., Srinivasan, P.: Hierarchical text categorization using neural networks. Information Retrieval 5(1), 87–118 (2002)

    Article  MATH  Google Scholar 

  19. Schutze, H., Hull, D.A., Pedersen, J.O.: A comparison of classifiers and document representations for the routing problem. In: Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, Seattle, Washington (1995)

    Google Scholar 

  20. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)

    Article  Google Scholar 

  21. Tong, S., Koller, D.: Support Vector Machine Active Learning with Applications to Text Classification. Journal of Machine Learning Research 2, 45–66 (2001)

    Article  Google Scholar 

  22. de Vel, O., Anderson, A., Corney, M., Mohay, G.: Mining E-Mail Content for Author Identification Forensics. SIGMOD Record 30(4), 55–64 (2001)

    Article  Google Scholar 

  23. Xu, Y., Olman, V., Xu, D.: Clustering gene expression data using graph-theoretic approach: an application of minimum spanning trees. Bioinformatics 18, 536–545 (2002)

    Article  Google Scholar 

  24. Yang, Y., Liu, X.: A re-examination of text categorisation methods. In: Proc. 22nd Int. ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR 1999), pp. 67–73 (1999)

    Google Scholar 

  25. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning, Nashville, US (1997)

    Google Scholar 

  26. Nigerian Fraud Email Gallery, http://potifos.com/fraud/

  27. http://en.wikipedia.org/wiki/Plagiarism

  28. http://en.wikipedia.org/wiki/D%C4%83nu%C5%A3_Marcu

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Wu, Q., Fuller, E., Zhang, CQ. (2010). Graph Model for Pattern Recognition in Text. In: Ting, IH., Wu, HJ., Ho, TH. (eds) Mining and Analyzing Social Networks. Studies in Computational Intelligence, vol 288. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13422-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13422-7_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13421-0

  • Online ISBN: 978-3-642-13422-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics