Abstract
Plagiarism takes place when we use any person’s work without giving due acknowledgment. There are several fields where the text similarity is involved like web document retrieval, information mining, and searching related articles. Several approaches have been introduced for detecting plagiarism in the text documents based on the syntactic structure of the text, string similarity, fingerprinting, semantic meaning underlying the text, etc. The basic limitation of plagiarism detection systems these days is that they fail to detect tough cases of plagiarism. The proposed plagiarism detection approach is the hybrid of semantic and syntactic similarity between the text documents. This novel approach exploits linguistic information sources non-linearly using the lexical database for finding the relatedness between text documents. The proposed approach uses semantic knowledge to perform cognitive-inspired computing. The framework is capable of detecting intelligent plagiarism cases like a verbatim copy, paraphrasing, rewording in a sentence, and sentence transformation. The approach has been evaluated on the standard PAN-PC-11 dataset. The experiments show that our technique has outperformed other strong baseline techniques in terms of precision, recall, F-measure, and plagiarism detection (PlagDet) score.
Similar content being viewed by others
References
Kauffman Y, Young MF. Digital plagiarism: an experimental study of the effect of instructional goals and copy-and-paste affordance. Comput Educ. 2015;83:44–56.
Smedley A, Crawford T, Cloete L. An intervention aimed at reducing plagiarism in undergraduate nursing students. Nurse Educ Pract. 2015;15(3):168–73.
Eret E, Gokmenoglu T. Plagiarism in higher education: a case study with prospective academicians. Proc Soc Behav Sci. 2010;2(2):3303–7.
Shivakumar N, Garcia-Molina H. SCAM: a copy detection mechanism for digital documents. Stanford: Department of Computer Science, Stanford University: Austin, Texas 1995.
Si A, Leong HV, Lau RWH. Check: a document plagiarism detection system. In: Proceedings of the 1997 ACM symposium on applied computing. ACM: San Jose, California, USA 1997. pp. 70-7.
Balaguer EV. Putting ourselves in SME’s shoes: automatic detection of plagiarism by the WCopyFind tool. In: Proceedings of the 3rd PAN'09 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, Bauhaus University Weimar, 2009. pp. 34-5.
Chong MYM. A study on plagiarism detection and plagiarism direction identification using natural language processing techniques. Thesis report, England: University of Wolverhampton; 2013.
Barrón-Cedeño A, Gupta P, Rosso P. Methods for cross-language plagiarism detection. Knowl-Based Syst. 2013;50:211–7.
Kent CK, Salim N. Web-based cross-language plagiarism detection. In: Computational Intelligence, Modelling and Simulation (CIMSiM), 2010 Second International Conference on. Sydney: IEEE: Indonesia 2010. pp. 199-204.
Potthast M, et al. Cross-language plagiarism detection. Lang Resour Eval. 2011;45(1):45–62.
Menai MEB, Bagais M. APlag: a plagiarism checker for Arabic texts. In: Computer Science & Education (ICCSE), 2011 6th International Conference on. Singapore: IEEE: Singapore 2011. pp. 1379-83.
Butakov S, Scherbinin V. The toolbox for local and global plagiarism detection. Comput Educ. 2009;52(4):781–8.
Jadalla A, Elnagar A. A plagiarism detection system for Arabic text-based documents. Pacific-Asia Workshop on Intelligence and Security Informatics. Heidelberg: Springer Berlin; 2012. pp. 145-53.
Li Y, Bandar ZA, McLean D. An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans Knowl Data Eng. 2003;15(4):871–82.
Grozea C, Gehl C, Popescu M. ENCOPLOT: pairwise sequence matching in linear time applied to plagiarism detection. In: Stein B, Rosso P, Stamatatos E, Koppel M, Agirre E, editors. 25th Conference of the Spanish Society for NLP, SEPLN ‘09. Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 09). Donostia; 2009. p. 10–8.
Agarwal B, Poria S, Mittal N, Gelbukh A, Hussain A. Concept-level sentiment analysis with dependency-based semantic parsing: a novel approach. Cogn Comput. 2015;7(4):487–99.
Zechner M, Muhr M, Kern R, Granitzer M. External and intrinsic plagiarism detection using vector space models. In: Stein B, Rosso P, Stamatatos E, Koppel M, Agirre E, editors. 25th Conference of the Spanish Society for NLP, SEPLN ‘09. Donostia; 2009. p. 47-55.
Basile C, Benedetto D, Caglioti E, Cristadoro G, Esposti MD. A plagiarism detection procedure in three steps: selection, matches and “squares”. In: Stein B, Rosso P, Stamatatos E, Koppel M, Agirre E, editors. 25th Conference of the Spanish Society for NLP, SEPLN ‘09. Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 09). Donostia; 2009. p. 19–23.
Kent CK, Salim N. Features based text similarity detection. Journal of Computing 2010. 2(1).
Hussein AS. Arabic document similarity analysis using n-grams and singular value decomposition. In: 2015 I.E. 9th International Conference on Research Challenges in Information Science (RCIS). Athens: IEEE, 2015. pp. 445-55
Chanceaux M, Guérin-Dugué A, Lemaire B, Baccino T. A computational cognitive model of information search in textual materials. Cogn Comput. 2014;6(1):1–17.
Ekbal A, Saha S, Choudhary G. Plagiarism detection in text using vector space model. In: Hybrid Intelligent Systems (HIS), 12th International Conference on. Pune: IEEE: Pune, India 2012. pp. 366-71.
Lin C, Liu D, Pang W, Wang Z. Sherlock: a semi-automatic framework for quiz generation using a hybrid semantic similarity measure. Cogn Comput. 2015;7(6):667–79.
Abdi A, et al. PDLK: plagiarism detection using linguistic knowledge. Expert Syst Appl. 2015;42(22):8936–46.
Schleimer S, Wilkerson DS, Aiken A. Winnowing: local algorithms for document fingerprinting. Proceedings of the 2003 ACM SIGMOD international conference on Management of data. ACM: San Diego, California, USA 2003. pp. 76-85
Velásquez JD, et al. DOCODE 3.0 (DOcument COpy DEtector): a system for plagiarism detection by applying an information fusion process from multiple documental data sources. Inf Fusion. 2016;27:64–75.
Sánchez-Vega F, et al. Determining and characterizing the reused text for plagiarism detection. Expert Syst Appl. 2013;40(5):1804–13.
Osman AH, et al. An improved plagiarism detection scheme based on semantic role labeling. Appl Soft Comput. 2012;12(5):1493–502.
Paul M, Jamal S. An improved SRL based plagiarism detection technique using sentence ranking. Proc Comput Sci. 2015;46:223–30.
Osman AH, et al. Conceptual similarity and graph-based method for plagiarism detection. J Theor Appl Inf Technol. 2011;32(2):135–45.
Alzahrani S, Salim N. Fuzzy semantic-based string similarity for extrinsic plagiarism detection. Braschler and Harman. 2010;1176:1-8.
Medin DL, Goldstone RL, Gentner D. Respects for similarity. Psychol Rev. 1993;100(2):254.
Miller GA. WordNet: a lexical database for English. Commun ACM. 1995;38(11):39–41.
Gene Ontology Consortium. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004;32(Suppl 1):D258–61.
Altheide P. Spatial data transfer standard (sdts). In: Encyclopedia of GIS. Springer US: USA 2008. pp. 1087-95.
Li Y, et al. Sentence similarity based on semantic nets and corpus statistics. IEEE Trans Knowl Data Eng. 2006;18(8):1138–50.
Rada R, e a. Development and application of a metric on semantic nets. IEEE Trans Syst Man Cybern. 1989;19(1):17–30.
Wu Z, Palmer M. Verbs semantics and lexical selection. In: Proceedings of the 32nd annual meeting on Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics: Las Cruces, 1994. pp. 133–8.
Lin D. An information-theoretic definition of similarity. Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998). San Francisco: Morgan Kaufmann Publishers Inc; 1998. pp. 296–304.
Lennon M, et al. An evaluation of some conflation algorithms for information retrieval. J Inf Sci. 1981;3(4):177–83.
Tomasic A, Garcia-Molina H. Query processing and inverted indices in shared: nothing text document information retrieval systems. VLDB J. 1993;2(3):243–76.
Alzahrani SM, Salim N, Abraham A. Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE Trans Syst Man Cybern Part C Appl Rev. 2012;42(2):133-49.
Stamatatos E. Plagiarism detection using stopword n-grams. J Assoc Inf Sci Technol. 2011;62(12):2512-27.
Ahsaee MG, Naghibzadeh M, Ehsan Yasrebi Naeini S. Semantic similarity assessment of words using weighted WordNet. Int J Mach Learn Cybern. 2014;5(3):479–90.
Wang S, Qi H, Kong L, Nu C. Combination of VSM and Jaccard coefficient for external plagiarism detection. In: 2013 International Conference on Machine Learning and Cybernetics, vol 4. Tianjin: IEEE: Tianjin, 2013. pp. 1880–85.
Ekbal A, Saha S, and Choudhary S. Plagiarism detection in text using vector space model. In: Hybrid Intelligent Systems (HIS), 2012 12th International Conference on. Pune: IEEE; 2012. pp. 366–71.
Grman J, Ravas R. Improved implementation for finding text similarities in large collections of data. Proc PAN at CLEF conference in Amsterdam, The Netherlands. 2011;4(4):339–365.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Ethical Approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Rights and permissions
About this article
Cite this article
Sahi, M., Gupta, V. A Novel Technique for Detecting Plagiarism in Documents Exploiting Information Sources. Cogn Comput 9, 852–867 (2017). https://doi.org/10.1007/s12559-017-9502-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-017-9502-4