Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5253))

Abstract

Multilingual text processing has been gaining more and more attention in recent years. This trend has been accentuated by the global integration of European states and the vanishing cultural and social boundaries. Multilingual text processing has become an important field bringing a lot of new and interesting problems. This paper describes a novel approach to multilingual plagiarism detection. We propose a new method called MLPlag for plagiarism detection in multilingual environment. This method is based on analysis of word positions. It utilizes the EuroWordNet thesaurus which transforms words into language independent form. This allows to identify documents plagiarized from sources written in other languages. Special techniques, such as semantic-based word normalization, were incorporated to refine our method. It identifies the replacement of synonyms used by plagiarists to hide the document match. We performed and evaluated our experiments on monolingual and multilingual corpora and results are presented in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Clough, P.: Plagiarism in natural and programming languages: An overview of current tools and technologies. In: Internal Report CS-00-05, Department of Computer Science, University of Sheffield (2000)

    Google Scholar 

  2. European Commission - Joint Research Centre: The JRC-Acquis Multilingual Parallel Corpus, Version 3.0 (Last update 23/1/2008), http://langtech.jrc.it/JRC-Acquis.html

  3. Gorin, R.: Ispell (Last update 5/6/1996), http://fmgwww.cs.ucla.edu/fmgmembers/geoff/ispell.html

  4. Global WordNet Association: EuroWordNet (Last update 9/1/2001), http://www.illc.uva.nl/EuroWordNet/

  5. Hajic, J.: Morphology analyzer (Last update 8/27/2001), http://quest.ms.mff.cuni.cz/pdt/Morphology_and_Tagging/Morphology/index.html

  6. Lane, P., Lyon, C., Malcolm, J.: Demonstration of the Ferret Plagiarism Detector. In: Proceedings of the 2nd International Plagiarism Conference, Newcastle, UK (2006)

    Google Scholar 

  7. Maurer, H., Kappe, F., Zaka, B.: Plagiarism – A Survey. Journal of Universal Computer Science 12(8), 1050–1084 (2006)

    Google Scholar 

  8. Myers, E.: An O(ND) Difference Algorithm and Its Variations. Algorithmica 1, 251–266 (1986)

    Article  MATH  MathSciNet  Google Scholar 

  9. Pataki, M.: Distributed Similarity and Plagiarism Search. In: Proceedings of the Automation and Applied Computer Science Workshop, pp. 121-130, Budapest, Hungary (2006) ISBN 963-420-865-7

    Google Scholar 

  10. Rijsbergen, C.: Information Retrieval. Butterworth-Heinemann, 2nd rev. edn. (March 1979) ISBN 0-408-70929-4

    Google Scholar 

  11. Runeson, P., Alexanderson, M., Nyholm, O.: Detection of Duplicate Defect Reports Using Natural Language Processing. In: Proceedings of the IEEE 29th International Conference on Software Engineering, pp. 499-510 (2007)

    Google Scholar 

  12. Salton, G.: The state of retrieval system evaluation. International Journal of Information Processing & Management 24(4), 441–449 (1992)

    Article  Google Scholar 

  13. Shivakumar, N., Garcia-Molina, H.: SCAM: A copy detection mechanism for digital documents. In: Proceedings of 2nd International Conference in Theory and Practice of Digital Libraries, Austin (1995)

    Google Scholar 

  14. Salton, G., Buckley, C.: Term-Weighting Approaches in Automatic Retrieval. Journal of Information Processing and Management 24(5), 513–523 (1988)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Danail Dochev Marco Pistore Paolo Traverso

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ceska, Z., Toman, M., Jezek, K. (2008). Multilingual Plagiarism Detection. In: Dochev, D., Pistore, M., Traverso, P. (eds) Artificial Intelligence: Methodology, Systems, and Applications. AIMSA 2008. Lecture Notes in Computer Science(), vol 5253. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85776-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85776-1_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85775-4

  • Online ISBN: 978-3-540-85776-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics