Skip to main content

Plagiarism Detection: Methodological Approaches

  • Chapter
  • First Online:
Language as Evidence
  • 1042 Accesses

Abstract

This chapter deals with plagiarism detection. After explaining the difference between plagiarism and copyright infringement, the chapter analyses the problems that challenge the expert linguist’s work, especially the undervaluation of scientific linguistic expertise in the courts of justice, the admissibility of scientific evidence in the courts of justice and the evaluation of text similarity. Subsequently, the chapter examines plagiarism frameworks, and addresses the latest research in computer-based plagiarism detection methods and their implementation in automated plagiarism detection systems. Furthermore, the chapter points to the essential complementary role that qualitative linguistic analysis plays in plagiarism detection and draws attention to the relevance of context in understanding and interpreting the data appropriately. Lastly, the chapter provides the reader with a detailed step-by-step analysis of a live case of plagiarism between translators.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    European Parliament. (2018). Copyright Law in the EU. Salient Features of Copyright Law across the EU Member States. European Parliamentary Research Service. Study. Retrieved from https://www.europarl.europa.eu/RegData/etudes/STUD/2018/625126/EPRS_STU(2018)625126_EN.pdf.

  2. 2.

    Spanish law, a civil law jurisdiction, explicitly protects the authors’ moral rights under art. 14. (Content and Characteristics of Moral Rights) of the Intellectual Property Act 1/1996: (1) The right to disclosure; (2) The right to determine how communication with the public should be effected; (3) The right to claim authorship; (4) The right to demand respect for the integrity of the work; (5) The right to modify the work with the permission of the copyright holder; (6) The right to withdraw the work due to changes in intellectual or moral convictions and (7) The right of access to the sole or rare copy of the work.

  3. 3.

    The other three enforceable limitations to the general public’s freedom of speech are patents, trademarks—and service marks—and trade secrets.

  4. 4.

    Copyright limitations are transnational in scope for most countries due to international treaties such as the Berne Convention of 1886, the UNESCO Universal Copyright Convention of 1952, the World Trade Organisation’s TRIPS Agreement of 1995 and the WIPO Copyright Treaty of 1996.

  5. 5.

    By way of example, the author has acted as an expert linguist in only four plagiarism cases over the last thirteen years, of which only two were court cases. In these two, the author acted as expert for the defendant. One case was relating to plagiarism between lawyers (Guillén-Nieto, 2020b), the other case concerned supposedly plagiarised electronic material into a teaching project.

  6. 6.

    Retrieved January, 3, 2021, from https://www.law.cornell.edu/wex/frye_standard.

  7. 7.

    Retrieved January, 3, 2021, from https://www.law.cornell.edu/wex/daubert_standard.

  8. 8.

    Retrieved from https://www.boe.es/buscar/doc.php?id=BOE-A-2000-323.

  9. 9.

    Retrieved from https://www.boe.es/buscar/act.php?id=BOE-A-1882-6036.

  10. 10.

    Kraus (2016) offers a comprehensive review of plagiarism detection systems and evaluation methods until 2012. Furthermore, Foltýnek et al. (2019) provide an exhaustive critical review evaluating the capabilities of computer-based academic plagiarism detection methods from 2013 to 2018. Over this period one can see major advances concerning the automated detection of obfuscated academic plagiarism forms.

  11. 11.

    PAN is a well-established platform for the comparative evaluation of authorship identification and plagiarism detection methods and systems. Retrieved from https://pan.webis.de/.

  12. 12.

    Retrieved from https://www.ithenticate.com/.

  13. 13.

    Retrieved from https://www.plagscan.com/es/?gclid=Cj0KCQiAx9mABhD0ARIsAEfpavR5s-cCrTnF608Lius5CnmZPtqJfK4JB0r5NZpKTjvN0OE9-mLhoBoaAoCKEALw_wcB.

  14. 14.

    Retrieved from https://www.turnitin.com/.

  15. 15.

    Retrieved from https://unicheck.com/es-es.

  16. 16.

    Retrieved from https://www.articlechecker.com/.

  17. 17.

    Retrieved from https://www.copyscape.com/.

  18. 18.

    Retrieved from https://antiplagiarist.softonic.com/.

  19. 19.

    Retrieved from https://www.duplichecker.com/.

  20. 20.

    Retrieved from https://www.plagium.com/.

  21. 21.

    Retrieved from https://plag.co/.

  22. 22.

    For confidential reasons, reference to the suspect translator is omitted.

  23. 23.

    Woolls (2012) explains that ʻin order to avoid over-matching, function words, due to their high frequencies in a language, are collected together on what is termed a “stop-list” and discounted altogether for vocabulary comparison purposesʼ (p. 525).

  24. 24.

    CREA (3.2 June 2008) is a current Spanish database that contains 160,000,000 linguistic forms from written and oral texts produced in all Spanish speaking countries from 1975 until 2004. The written texts have been selected from books, journals and magazines.

  25. 25.

    CORDE is a database of diachronic Spanish. It contains 250 million linguistic forms from a wide range of genres from the Spanish language’s origins until 1974.

References

  • Ainsworth, J., & Juola, P. (2019). Who wrote this? Modern forensic authorship analysis as a model for valid forensic science. Washington University Law Review, 96(5), 1161–1189.

    Google Scholar 

  • Bakhtin, M. (1981). The dialogic imagination: Four essays (Ed. M. Holquist; Trans. C. Emerson, & M. Holquist). Austin: University of Texas Press.

    Google Scholar 

  • Bazerman, C. (2004). Intertextuality: How texts rely on other texts. In C. Bazerman, & P. Prior (Eds.), What writing does and how it does it (pp. 309–339). Lawrence Erlbaum.

    Google Scholar 

  • Butters, R. R. (2008). Trademarks and other proprietary terms. In J. Gibbons, & M. Teresa Turell (Eds.), Dimensions of forensic linguistics (pp. 231–247). John Benjamins Publishing Company.

    Google Scholar 

  • Butters, R. R. (2012). Language and copyright. In P. M. Tiersma, & L. M. Solan (Eds.), The Oxford handbook of language and law (pp. 463–477). Oxford University Press.

    Google Scholar 

  • Chaski, C. (2013). Best practices and admissibility of forensic author identification. Journal of Law and Policy, 21, 333–376. https://brooklynworks.brooklaw.edu/jlp/vol21/iss2/5

    Google Scholar 

  • Chatterjee-Padmanabhen, M. (2014). Bakhtin’s theory of heteroglossia/intertextuality in teaching academic writing in higher education. Journal of Academic Language & Learning, 8(3), A101–A112.

    Google Scholar 

  • Copyright, Designs and Patents Act. 1988. https://www.legislation.gov.uk/ukpga/1988/48/contents

  • Coulthard, M., & Johnson, A. (Eds.). (2007). An Introduction to forensic linguistics: Language in evidence. Routledge.

    Google Scholar 

  • Coulthard, M., Johnson, A., Kredens, K., & Woolls, D. (2010). Four forensic linguists’ responses to suspected plagiarism. In M. Coulthard, & A. Johnson (Eds.), An introduction to forensic linguistics: Language in evidence (pp. 523–538). Routledge.

    Google Scholar 

  • Daubert Standard. (1993). Retrieved on January 8, 2021, from https://www.law.cornell.edu/wex/daubert_standard

  • Daubert v Merrell Dow Pharmaceuticals. (1993). (US). https://caselaw.findlaw.com/us-supreme-court/509/579.html

  • De Luca, S., Navarro, F., & Cameriere, R. (2013). La prueba pericial y su valoración en el ámbito judicial español. Revista electrónica de ciencia penal y criminología. Artículos RECPC, 15(19), 1–14.

    Google Scholar 

  • Eggington, W. G. (2008). Deception and fraud. In J. Gibbons & M. T. Turell (Eds.), Dimensions of forensic linguistics (pp. 249–264). John Benjamins Publishing Company.

    Chapter  Google Scholar 

  • Ehrhardt, S. (2018). Authorship attribution analysis. In J. Visconti (Ed.), Handbook of communication in the legal sphere (pp. 169–200). Boston.

    Chapter  Google Scholar 

  • European Parliament. (2018). Copyright law in the EU. Salient features of copyright law across the EU member states. European Parliamentary Research Service. Study. https://www.europarl.europa.eu/RegData/etudes/STUD/2018/625126/EPRS_STU(2018)625126_EN.pdf

  • Foltýnek, T., Meuschke, N., & Gipp, B. (2019). Academic plagiarism detection: A systematic literature review. ACM Computing Surveys, 52(6), 1–42. https://doi.org/10.1145/3345317

    Article  Google Scholar 

  • Franco-Salvador, M., Gupta, P., & Rosso, P. (2013). Cross-language plagiarism detection using a multilingual semantic network. In P. Serdyukov et al. (Eds.), Advances in information retrieval. ECIR 2013. Lecture notes in computer science (pp. 710–713). Springer. https://doi.org/10.1007/978-3-642-36973-5_66

    Google Scholar 

  • Frye Standard. (1923). Retrieved on January 6, 2021, from https://www.law.cornell.edu/wex/frye_standard

  • Frye v United States. (1923). 293 F. 1013 (US). https://www.mass.gov/doc/frye-v-united-states-293-f-1013-dc-cir-1923/download

  • Gamini Fonseka, E. A. (2020). Sacrifice unacknowledged: A literary analysis of the nightingale and the rose by Oscar Wilde. American Research Journal of English and Literature, 6(1), 1–8. https://doi.org/10.21694/2378-9026.20010

    Article  Google Scholar 

  • Gil, L., Soler, C., Stuart, K., & Candela, J. (2004). TextWorks. Departamento de Idiomas, Universidad Politécnica de Valencia.

    Google Scholar 

  • Gipp, B. (2014). Citation-based plagiarism detection. In Citation-based plagiarism detection (pp. 57–88). Springer. https://doi.org/10.1007/978-3-658-06394-8_4

    Google Scholar 

  • Green, S. P. (2002). Plagiarism, norms, and the limits of the theft law: Some observations on the use of criminal sanctions in enforcing intellectual property rights. Hastings Law Journal, 54(1), 167–242. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=315562

    Google Scholar 

  • Guillén-Nieto, V. (2011). The expert as witness in the CTM courts. International Journal of Applied Linguistics (ITL), 162, 63–83.

    Article  Google Scholar 

  • Guillén-Nieto, V. (2020a). Defamation as a language crime: A socio-pragmatic approach to defamation cases in the high courts of justice of Spain. International Journal of Language & Law (JLL), 9, 1–22.

    Google Scholar 

  • Guillén-Nieto, V. (2020b). The relevance of context in plagiarism detection: The case of a professional legal genre. Ibérica, 40, 101–122.

    Google Scholar 

  • Guillén-Nieto, V. (2021). ʻWhat else can you do to pass…?ʼ A pragmatics-based approach to quid-pro-quo sexual harassment. In J. Giltrow, F. Olsen, & D. Mancini (Eds.), Legal meanings and language rights. International, social and philosophical perspectives (pp. 31–55). de Gruyter Mouton. https://doi.org/10.1515/9783110720969

    Google Scholar 

  • Hage, J., Rademaker, P., & van Vugt, N. (2010). A comparison of plagiarism detection tools. In Technical report UU-CS-2010-2015 (pp. 1–26). Department of Information and Computing Sciences, Utrecht University. http://www.cs.uu.nl/research/techreps/repo/CS-2010/2010-015.pdf

    Google Scholar 

  • Hussain, F., & Suryani, M. A. (2015). On retrieving intelligently plagiarized documents using semantic similarity. Engineering Applications of Artificial Intelligence, 45, 246–258. https://doi.org/10.1016/j.engappai.2015.07.011

    Article  Google Scholar 

  • Kraus, C. (2016). Plagiarism detection. State-of-the art systems (2016) and evaluation detection. Retrieved from arXiv:1603.03014v1 [cs.IR].

    Google Scholar 

  • Kristeva, J. (1980). The bounded text. In L. Roudiez, T. Gora, & A. Jardine (Eds.), Desire in language: A semiotic approach to literature and art (pp. 36–63). Columbia University Press.

    Google Scholar 

  • Love, H. (2002). Attributing authorship. An introduction. Cambridge University Press.

    Book  Google Scholar 

  • Lukashenko, R., Graudina, V., & Grundspenkis, J. (2007). Computer-based plagiarism detection methods and tools: An overview. Proceedings of the 2007 International Conference on Computer Systems and Technologies, 18, 1–6. https://dl.acm.org/doi/10.1145/1330598.1330642

    Google Scholar 

  • Meuschke, N., Gipp, B., & Lipinsk, M. (2015). CITREC: An evaluation framework for citation-based similarity measures based on TREC genomics and PubMed Central. In iConference 2015 Proceedings. http://hdl.handle.net/2142/73680

  • Meuschke, N., Shubotz, M., Hamborg, F., Skopal, T., & Gipp, B. (2017). Analyzing mathematical content to detect academic plagiarism. CIKM’17 Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2211–2214. https://doi.org/10.1145/3132847.3133144

  • Meyer zu Eissen, S., & Stein, B. (2006). Intrinsic plagiarism detection. Lecture Notes in Computer Science, 3936, 565–569. https://doi.org/10.1007/11735106_66

    Article  Google Scholar 

  • Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. Computation and Language, 1–12. https://arxiv.org/abs/1301.3781

  • Nicklaus, M., & Stein, D. (2020). The role of linguistics in veracity evaluation. International Journal of Language & Law (JLL), 9, 23–47.

    Google Scholar 

  • Osman, A. H., Salim, N., Binwahlan, M. S., Alteeb, R., & Abuobieda, A. (2012). An improved plagiarism detection scheme based on semantic role labelling. Applied Soft Computing, 12(5), 1493–1502. https://doi.org/10.1016/j.asoc.2011.12.021

    Article  Google Scholar 

  • Pennycook, A. (1994). The complex contexts of plagiarism: A reply to Deckert. Journal of Second Language Writing, 3, 277–284.

    Article  Google Scholar 

  • Pennycook, A. (1996). Borrowing others’ words: Text, ownership, memory, and plagiarism. TESOL Quarterly, 30, 201–203.

    Article  Google Scholar 

  • Potthast, M., Stein, B., Barrón-Cedeño, A., & Rosso, P. (2010). An evaluation framework for plagiarism detection in COLING’10. Proceedings of the 23rd International Conference on Computational Linguistics, 997–1005. https://www.aclweb.org/anthology/C10-2115

  • Real Academia Española: Banco de datos (CORDE) [online]. (n.d.). Corpus diacrónico del español. Retrieved February 16, 2021, from http://www.rae.es

  • Real Academia Española: Banco de datos (CREA) [online]. (n.d.). Corpus de referencia del español actual. Retrieved February 16, 2021, from http://www.rae.es

  • Rieber, R. W., & Stewart, W. A. (Eds.). (1990). The language scientist as expert in the legal setting. Annals of the New York academy of sciences, 606 (pp. 1–135). The New York Academy of Sciences.

    Google Scholar 

  • Shuy, R. (2008). Fighting over words: Language and civil law cases. Oxford University Press.

    Book  Google Scholar 

  • Sousa-Silva, R. (2014). Detecting translingual plagiarism and the backlash against translation plagiarists. Language and Law/Linguagem e Direito, 1(1), 70–94.

    Google Scholar 

  • Sousa-Silva, R. (2015). ʻReporter fired for plagiarism: A forensic linguistic analysis of news plagiarismʼ. In Simões, Barreiro, Santos, Sousa-Silva, & Tagnin (Eds.), Linguistica, informática e tradução: Mundos que se cruzam. Oslo Studies in Language, 7(1), 301–322.

    Article  Google Scholar 

  • Spanish Civil Procedure Act (LEC) 1/2000. (n.d.). BOE-A-2000-323. https://www.boe.es/buscar/doc.php?id=BOE-A-2000-323

  • Spanish Criminal Act (LECrim) 1882. (n.d.). BOE-A-1882-6036. https://www.boe.es/buscar/act.php?id=BOE-A-1882-6036

  • Spanish Criminal Code 2014. (n.d.). Clinter (Trans.). Ministry of Justice. Official State Gazette, 281. https://www.legislationline.org/download/id/6443/file/Spain_CC_am2013_en.pdf

  • Spanish Intellectual Property Act 2012. (n.d.). Clinter (Trans.). Ministry of Justice. Official State Gazette, 97. https://www.wipo.int/edocs/lexdocs/laws/en/es/es177en.pdf

  • Stamatatos, E. (2009). Intrinsic plagiarism detection. Using character n-gram profiles. In B. Stein, P. Rosso, E. Stamatatos, M. Koppel, & E. Agirre (Eds.), Proceedings of the SEPLN’09 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse (pp. 38–46). http://ceur-ws.org/Vol-502/pan09-proceedings.pdf

  • Stein, B., Lipka, N., & Prettenhofer, P. (2011). Intrinsic plagiarism analysis. Language Resources and Evaluation, 45(1), 63–82. https://doi.org/10.1007/s10579-010-9115-y

    Article  Google Scholar 

  • Turell, M. T. (2004). Textual kidnapping revisited: The case of plagiarism in literary translation. International Journal of Speech, Language and the Law, 11, 1–26.

    Article  Google Scholar 

  • Turell, M. T. (2008). Plagiarism. In J. Gibbons, & M. T. Turell (Eds.), Dimensions of forensic linguistics (pp. 265–299). John Benjamins Publishing Company.

    Google Scholar 

  • Turney, P. D., & Pantel, P. (2010). From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 37, 141–188. https://doi.org/10.1613/jair.2934

    Article  Google Scholar 

  • Turnitin. http://turnitin.com/

  • van Dam, M. (2013). A basic character n-gram approach to authorship verification. Notebook for PAN at CLEF 2013. http://ceur-ws.org/Vol-1179/CLEF2013wn-PAN-vanDam2013.pdf

  • van Dijk, T. A. (2015). Context. In K. Tracy, C. Ilie, & T. Sandel (Eds.), The international encyclopedia of language and social interaction (1st ed., pp. 1–11). John Wiley & Sons, Inc. https://doi.org/10.1002/9781118611463/wbielsi056

    Google Scholar 

  • Willis, Sh. et al. (2015). ENFSI Guideline for Evaluative Reporting in Forensic Science. Strengthening the Evaluation of Forensic Results across Europe (STEOFRAE). https://enfsi.eu/wp-content/uploads/2016/09/m1_guideline.pdf

  • Woolls, D. (2002). CopyCatch Gold v2. CFL Software.

    Google Scholar 

  • Woolls, D. (2010). Computational forensic linguistics. Searching for similarity in large specialised corpora. In M. Coulthard, & A. Johnson (Eds.), The Routledge handbook of forensic linguistics (pp. 576–590). Routledge.

    Google Scholar 

  • Woolls, D. (2012). Detecting plagiarism. In P. M. Tiersma, & L. M. Solan (Eds.), The Oxford handbook of language and law (pp. 517–529). Oxford University Press.

    Google Scholar 

Primary Sources

  • Baeza, R. (1980). Oscar Wilde. El príncipe feliz y otros cuentos. Bruguera.

    Google Scholar 

  • Gómez de la Serna Puig, J. (1943). Oscar Wilde. Obras completas. Aguilar.

    Google Scholar 

  • Montes, C. (1988). Oscar Wilde. Cuentos completos. Espasa Calpe.

    Google Scholar 

  • Sarto, J. (2003). El ruiseñor y la rosa. Susaeta.

    Google Scholar 

  • Wilde, O. (1888). The happy prince and other tales. Book from Project Gutenberg. [Online].

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Victoria Guillén-Nieto .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Guillén-Nieto, V. (2022). Plagiarism Detection: Methodological Approaches. In: Guillén-Nieto, V., Stein, D. (eds) Language as Evidence. Palgrave Macmillan, Cham. https://doi.org/10.1007/978-3-030-84330-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-84330-4_10

  • Published:

  • Publisher Name: Palgrave Macmillan, Cham

  • Print ISBN: 978-3-030-84329-8

  • Online ISBN: 978-3-030-84330-4

  • eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics