Abstract
This chapter deals with plagiarism detection. After explaining the difference between plagiarism and copyright infringement, the chapter analyses the problems that challenge the expert linguist’s work, especially the undervaluation of scientific linguistic expertise in the courts of justice, the admissibility of scientific evidence in the courts of justice and the evaluation of text similarity. Subsequently, the chapter examines plagiarism frameworks, and addresses the latest research in computer-based plagiarism detection methods and their implementation in automated plagiarism detection systems. Furthermore, the chapter points to the essential complementary role that qualitative linguistic analysis plays in plagiarism detection and draws attention to the relevance of context in understanding and interpreting the data appropriately. Lastly, the chapter provides the reader with a detailed step-by-step analysis of a live case of plagiarism between translators.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
European Parliament. (2018). Copyright Law in the EU. Salient Features of Copyright Law across the EU Member States. European Parliamentary Research Service. Study. Retrieved from https://www.europarl.europa.eu/RegData/etudes/STUD/2018/625126/EPRS_STU(2018)625126_EN.pdf.
- 2.
Spanish law, a civil law jurisdiction, explicitly protects the authors’ moral rights under art. 14. (Content and Characteristics of Moral Rights) of the Intellectual Property Act 1/1996: (1) The right to disclosure; (2) The right to determine how communication with the public should be effected; (3) The right to claim authorship; (4) The right to demand respect for the integrity of the work; (5) The right to modify the work with the permission of the copyright holder; (6) The right to withdraw the work due to changes in intellectual or moral convictions and (7) The right of access to the sole or rare copy of the work.
- 3.
The other three enforceable limitations to the general public’s freedom of speech are patents, trademarks—and service marks—and trade secrets.
- 4.
Copyright limitations are transnational in scope for most countries due to international treaties such as the Berne Convention of 1886, the UNESCO Universal Copyright Convention of 1952, the World Trade Organisation’s TRIPS Agreement of 1995 and the WIPO Copyright Treaty of 1996.
- 5.
By way of example, the author has acted as an expert linguist in only four plagiarism cases over the last thirteen years, of which only two were court cases. In these two, the author acted as expert for the defendant. One case was relating to plagiarism between lawyers (Guillén-Nieto, 2020b), the other case concerned supposedly plagiarised electronic material into a teaching project.
- 6.
Retrieved January, 3, 2021, from https://www.law.cornell.edu/wex/frye_standard.
- 7.
Retrieved January, 3, 2021, from https://www.law.cornell.edu/wex/daubert_standard.
- 8.
Retrieved from https://www.boe.es/buscar/doc.php?id=BOE-A-2000-323.
- 9.
Retrieved from https://www.boe.es/buscar/act.php?id=BOE-A-1882-6036.
- 10.
Kraus (2016) offers a comprehensive review of plagiarism detection systems and evaluation methods until 2012. Furthermore, Foltýnek et al. (2019) provide an exhaustive critical review evaluating the capabilities of computer-based academic plagiarism detection methods from 2013 to 2018. Over this period one can see major advances concerning the automated detection of obfuscated academic plagiarism forms.
- 11.
PAN is a well-established platform for the comparative evaluation of authorship identification and plagiarism detection methods and systems. Retrieved from https://pan.webis.de/.
- 12.
Retrieved from https://www.ithenticate.com/.
- 13.
- 14.
Retrieved from https://www.turnitin.com/.
- 15.
Retrieved from https://unicheck.com/es-es.
- 16.
Retrieved from https://www.articlechecker.com/.
- 17.
Retrieved from https://www.copyscape.com/.
- 18.
Retrieved from https://antiplagiarist.softonic.com/.
- 19.
Retrieved from https://www.duplichecker.com/.
- 20.
Retrieved from https://www.plagium.com/.
- 21.
Retrieved from https://plag.co/.
- 22.
For confidential reasons, reference to the suspect translator is omitted.
- 23.
Woolls (2012) explains that ʻin order to avoid over-matching, function words, due to their high frequencies in a language, are collected together on what is termed a “stop-list” and discounted altogether for vocabulary comparison purposesʼ (p. 525).
- 24.
CREA (3.2 June 2008) is a current Spanish database that contains 160,000,000 linguistic forms from written and oral texts produced in all Spanish speaking countries from 1975 until 2004. The written texts have been selected from books, journals and magazines.
- 25.
CORDE is a database of diachronic Spanish. It contains 250 million linguistic forms from a wide range of genres from the Spanish language’s origins until 1974.
References
Ainsworth, J., & Juola, P. (2019). Who wrote this? Modern forensic authorship analysis as a model for valid forensic science. Washington University Law Review, 96(5), 1161–1189.
Bakhtin, M. (1981). The dialogic imagination: Four essays (Ed. M. Holquist; Trans. C. Emerson, & M. Holquist). Austin: University of Texas Press.
Bazerman, C. (2004). Intertextuality: How texts rely on other texts. In C. Bazerman, & P. Prior (Eds.), What writing does and how it does it (pp. 309–339). Lawrence Erlbaum.
Butters, R. R. (2008). Trademarks and other proprietary terms. In J. Gibbons, & M. Teresa Turell (Eds.), Dimensions of forensic linguistics (pp. 231–247). John Benjamins Publishing Company.
Butters, R. R. (2012). Language and copyright. In P. M. Tiersma, & L. M. Solan (Eds.), The Oxford handbook of language and law (pp. 463–477). Oxford University Press.
Chaski, C. (2013). Best practices and admissibility of forensic author identification. Journal of Law and Policy, 21, 333–376. https://brooklynworks.brooklaw.edu/jlp/vol21/iss2/5
Chatterjee-Padmanabhen, M. (2014). Bakhtin’s theory of heteroglossia/intertextuality in teaching academic writing in higher education. Journal of Academic Language & Learning, 8(3), A101–A112.
Copyright, Designs and Patents Act. 1988. https://www.legislation.gov.uk/ukpga/1988/48/contents
Coulthard, M., & Johnson, A. (Eds.). (2007). An Introduction to forensic linguistics: Language in evidence. Routledge.
Coulthard, M., Johnson, A., Kredens, K., & Woolls, D. (2010). Four forensic linguists’ responses to suspected plagiarism. In M. Coulthard, & A. Johnson (Eds.), An introduction to forensic linguistics: Language in evidence (pp. 523–538). Routledge.
Daubert Standard. (1993). Retrieved on January 8, 2021, from https://www.law.cornell.edu/wex/daubert_standard
Daubert v Merrell Dow Pharmaceuticals. (1993). (US). https://caselaw.findlaw.com/us-supreme-court/509/579.html
De Luca, S., Navarro, F., & Cameriere, R. (2013). La prueba pericial y su valoración en el ámbito judicial español. Revista electrónica de ciencia penal y criminología. Artículos RECPC, 15(19), 1–14.
Eggington, W. G. (2008). Deception and fraud. In J. Gibbons & M. T. Turell (Eds.), Dimensions of forensic linguistics (pp. 249–264). John Benjamins Publishing Company.
Ehrhardt, S. (2018). Authorship attribution analysis. In J. Visconti (Ed.), Handbook of communication in the legal sphere (pp. 169–200). Boston.
European Parliament. (2018). Copyright law in the EU. Salient features of copyright law across the EU member states. European Parliamentary Research Service. Study. https://www.europarl.europa.eu/RegData/etudes/STUD/2018/625126/EPRS_STU(2018)625126_EN.pdf
Foltýnek, T., Meuschke, N., & Gipp, B. (2019). Academic plagiarism detection: A systematic literature review. ACM Computing Surveys, 52(6), 1–42. https://doi.org/10.1145/3345317
Franco-Salvador, M., Gupta, P., & Rosso, P. (2013). Cross-language plagiarism detection using a multilingual semantic network. In P. Serdyukov et al. (Eds.), Advances in information retrieval. ECIR 2013. Lecture notes in computer science (pp. 710–713). Springer. https://doi.org/10.1007/978-3-642-36973-5_66
Frye Standard. (1923). Retrieved on January 6, 2021, from https://www.law.cornell.edu/wex/frye_standard
Frye v United States. (1923). 293 F. 1013 (US). https://www.mass.gov/doc/frye-v-united-states-293-f-1013-dc-cir-1923/download
Gamini Fonseka, E. A. (2020). Sacrifice unacknowledged: A literary analysis of the nightingale and the rose by Oscar Wilde. American Research Journal of English and Literature, 6(1), 1–8. https://doi.org/10.21694/2378-9026.20010
Gil, L., Soler, C., Stuart, K., & Candela, J. (2004). TextWorks. Departamento de Idiomas, Universidad Politécnica de Valencia.
Gipp, B. (2014). Citation-based plagiarism detection. In Citation-based plagiarism detection (pp. 57–88). Springer. https://doi.org/10.1007/978-3-658-06394-8_4
Green, S. P. (2002). Plagiarism, norms, and the limits of the theft law: Some observations on the use of criminal sanctions in enforcing intellectual property rights. Hastings Law Journal, 54(1), 167–242. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=315562
Guillén-Nieto, V. (2011). The expert as witness in the CTM courts. International Journal of Applied Linguistics (ITL), 162, 63–83.
Guillén-Nieto, V. (2020a). Defamation as a language crime: A socio-pragmatic approach to defamation cases in the high courts of justice of Spain. International Journal of Language & Law (JLL), 9, 1–22.
Guillén-Nieto, V. (2020b). The relevance of context in plagiarism detection: The case of a professional legal genre. Ibérica, 40, 101–122.
Guillén-Nieto, V. (2021). ʻWhat else can you do to pass…?ʼ A pragmatics-based approach to quid-pro-quo sexual harassment. In J. Giltrow, F. Olsen, & D. Mancini (Eds.), Legal meanings and language rights. International, social and philosophical perspectives (pp. 31–55). de Gruyter Mouton. https://doi.org/10.1515/9783110720969
Hage, J., Rademaker, P., & van Vugt, N. (2010). A comparison of plagiarism detection tools. In Technical report UU-CS-2010-2015 (pp. 1–26). Department of Information and Computing Sciences, Utrecht University. http://www.cs.uu.nl/research/techreps/repo/CS-2010/2010-015.pdf
Hussain, F., & Suryani, M. A. (2015). On retrieving intelligently plagiarized documents using semantic similarity. Engineering Applications of Artificial Intelligence, 45, 246–258. https://doi.org/10.1016/j.engappai.2015.07.011
Kraus, C. (2016). Plagiarism detection. State-of-the art systems (2016) and evaluation detection. Retrieved from arXiv:1603.03014v1 [cs.IR].
Kristeva, J. (1980). The bounded text. In L. Roudiez, T. Gora, & A. Jardine (Eds.), Desire in language: A semiotic approach to literature and art (pp. 36–63). Columbia University Press.
Love, H. (2002). Attributing authorship. An introduction. Cambridge University Press.
Lukashenko, R., Graudina, V., & Grundspenkis, J. (2007). Computer-based plagiarism detection methods and tools: An overview. Proceedings of the 2007 International Conference on Computer Systems and Technologies, 18, 1–6. https://dl.acm.org/doi/10.1145/1330598.1330642
Meuschke, N., Gipp, B., & Lipinsk, M. (2015). CITREC: An evaluation framework for citation-based similarity measures based on TREC genomics and PubMed Central. In iConference 2015 Proceedings. http://hdl.handle.net/2142/73680
Meuschke, N., Shubotz, M., Hamborg, F., Skopal, T., & Gipp, B. (2017). Analyzing mathematical content to detect academic plagiarism. CIKM’17 Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2211–2214. https://doi.org/10.1145/3132847.3133144
Meyer zu Eissen, S., & Stein, B. (2006). Intrinsic plagiarism detection. Lecture Notes in Computer Science, 3936, 565–569. https://doi.org/10.1007/11735106_66
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. Computation and Language, 1–12. https://arxiv.org/abs/1301.3781
Nicklaus, M., & Stein, D. (2020). The role of linguistics in veracity evaluation. International Journal of Language & Law (JLL), 9, 23–47.
Osman, A. H., Salim, N., Binwahlan, M. S., Alteeb, R., & Abuobieda, A. (2012). An improved plagiarism detection scheme based on semantic role labelling. Applied Soft Computing, 12(5), 1493–1502. https://doi.org/10.1016/j.asoc.2011.12.021
Pennycook, A. (1994). The complex contexts of plagiarism: A reply to Deckert. Journal of Second Language Writing, 3, 277–284.
Pennycook, A. (1996). Borrowing others’ words: Text, ownership, memory, and plagiarism. TESOL Quarterly, 30, 201–203.
Potthast, M., Stein, B., Barrón-Cedeño, A., & Rosso, P. (2010). An evaluation framework for plagiarism detection in COLING’10. Proceedings of the 23rd International Conference on Computational Linguistics, 997–1005. https://www.aclweb.org/anthology/C10-2115
Real Academia Española: Banco de datos (CORDE) [online]. (n.d.). Corpus diacrónico del español. Retrieved February 16, 2021, from http://www.rae.es
Real Academia Española: Banco de datos (CREA) [online]. (n.d.). Corpus de referencia del español actual. Retrieved February 16, 2021, from http://www.rae.es
Rieber, R. W., & Stewart, W. A. (Eds.). (1990). The language scientist as expert in the legal setting. Annals of the New York academy of sciences, 606 (pp. 1–135). The New York Academy of Sciences.
Shuy, R. (2008). Fighting over words: Language and civil law cases. Oxford University Press.
Sousa-Silva, R. (2014). Detecting translingual plagiarism and the backlash against translation plagiarists. Language and Law/Linguagem e Direito, 1(1), 70–94.
Sousa-Silva, R. (2015). ʻReporter fired for plagiarism: A forensic linguistic analysis of news plagiarismʼ. In Simões, Barreiro, Santos, Sousa-Silva, & Tagnin (Eds.), Linguistica, informática e tradução: Mundos que se cruzam. Oslo Studies in Language, 7(1), 301–322.
Spanish Civil Procedure Act (LEC) 1/2000. (n.d.). BOE-A-2000-323. https://www.boe.es/buscar/doc.php?id=BOE-A-2000-323
Spanish Criminal Act (LECrim) 1882. (n.d.). BOE-A-1882-6036. https://www.boe.es/buscar/act.php?id=BOE-A-1882-6036
Spanish Criminal Code 2014. (n.d.). Clinter (Trans.). Ministry of Justice. Official State Gazette, 281. https://www.legislationline.org/download/id/6443/file/Spain_CC_am2013_en.pdf
Spanish Intellectual Property Act 2012. (n.d.). Clinter (Trans.). Ministry of Justice. Official State Gazette, 97. https://www.wipo.int/edocs/lexdocs/laws/en/es/es177en.pdf
Stamatatos, E. (2009). Intrinsic plagiarism detection. Using character n-gram profiles. In B. Stein, P. Rosso, E. Stamatatos, M. Koppel, & E. Agirre (Eds.), Proceedings of the SEPLN’09 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse (pp. 38–46). http://ceur-ws.org/Vol-502/pan09-proceedings.pdf
Stein, B., Lipka, N., & Prettenhofer, P. (2011). Intrinsic plagiarism analysis. Language Resources and Evaluation, 45(1), 63–82. https://doi.org/10.1007/s10579-010-9115-y
Turell, M. T. (2004). Textual kidnapping revisited: The case of plagiarism in literary translation. International Journal of Speech, Language and the Law, 11, 1–26.
Turell, M. T. (2008). Plagiarism. In J. Gibbons, & M. T. Turell (Eds.), Dimensions of forensic linguistics (pp. 265–299). John Benjamins Publishing Company.
Turney, P. D., & Pantel, P. (2010). From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 37, 141–188. https://doi.org/10.1613/jair.2934
Turnitin. http://turnitin.com/
van Dam, M. (2013). A basic character n-gram approach to authorship verification. Notebook for PAN at CLEF 2013. http://ceur-ws.org/Vol-1179/CLEF2013wn-PAN-vanDam2013.pdf
van Dijk, T. A. (2015). Context. In K. Tracy, C. Ilie, & T. Sandel (Eds.), The international encyclopedia of language and social interaction (1st ed., pp. 1–11). John Wiley & Sons, Inc. https://doi.org/10.1002/9781118611463/wbielsi056
Willis, Sh. et al. (2015). ENFSI Guideline for Evaluative Reporting in Forensic Science. Strengthening the Evaluation of Forensic Results across Europe (STEOFRAE). https://enfsi.eu/wp-content/uploads/2016/09/m1_guideline.pdf
Woolls, D. (2002). CopyCatch Gold v2. CFL Software.
Woolls, D. (2010). Computational forensic linguistics. Searching for similarity in large specialised corpora. In M. Coulthard, & A. Johnson (Eds.), The Routledge handbook of forensic linguistics (pp. 576–590). Routledge.
Woolls, D. (2012). Detecting plagiarism. In P. M. Tiersma, & L. M. Solan (Eds.), The Oxford handbook of language and law (pp. 517–529). Oxford University Press.
Primary Sources
Baeza, R. (1980). Oscar Wilde. El príncipe feliz y otros cuentos. Bruguera.
Gómez de la Serna Puig, J. (1943). Oscar Wilde. Obras completas. Aguilar.
Montes, C. (1988). Oscar Wilde. Cuentos completos. Espasa Calpe.
Sarto, J. (2003). El ruiseñor y la rosa. Susaeta.
Wilde, O. (1888). The happy prince and other tales. Book from Project Gutenberg. [Online].
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Guillén-Nieto, V. (2022). Plagiarism Detection: Methodological Approaches. In: Guillén-Nieto, V., Stein, D. (eds) Language as Evidence. Palgrave Macmillan, Cham. https://doi.org/10.1007/978-3-030-84330-4_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-84330-4_10
Published:
Publisher Name: Palgrave Macmillan, Cham
Print ISBN: 978-3-030-84329-8
Online ISBN: 978-3-030-84330-4
eBook Packages: Social SciencesSocial Sciences (R0)