A Novel Technique for Detecting Plagiarism in Documents Exploiting Information Sources

Sahi, Mansi; Gupta, Vishal

doi:10.1007/s12559-017-9502-4

A Novel Technique for Detecting Plagiarism in Documents Exploiting Information Sources

Published: 22 August 2017

Volume 9, pages 852–867, (2017)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

Mansi Sahi¹ &
Vishal Gupta¹

554 Accesses
17 Citations
1 Altmetric
Explore all metrics

Abstract

Plagiarism takes place when we use any person’s work without giving due acknowledgment. There are several fields where the text similarity is involved like web document retrieval, information mining, and searching related articles. Several approaches have been introduced for detecting plagiarism in the text documents based on the syntactic structure of the text, string similarity, fingerprinting, semantic meaning underlying the text, etc. The basic limitation of plagiarism detection systems these days is that they fail to detect tough cases of plagiarism. The proposed plagiarism detection approach is the hybrid of semantic and syntactic similarity between the text documents. This novel approach exploits linguistic information sources non-linearly using the lexical database for finding the relatedness between text documents. The proposed approach uses semantic knowledge to perform cognitive-inspired computing. The framework is capable of detecting intelligent plagiarism cases like a verbatim copy, paraphrasing, rewording in a sentence, and sentence transformation. The approach has been evaluated on the standard PAN-PC-11 dataset. The experiments show that our technique has outperformed other strong baseline techniques in terms of precision, recall, F-measure, and plagiarism detection (PlagDet) score.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A New Hybrid Technique for Detection of Plagiarism from Text Documents

Article 11 May 2020

Survey on Plagiarism Detection Systems and Their Comparison

An Enhanced Plagiarism Detection Based on Syntactico-Semantic Knowledge

References

Kauffman Y, Young MF. Digital plagiarism: an experimental study of the effect of instructional goals and copy-and-paste affordance. Comput Educ. 2015;83:44–56.
Article Google Scholar
Smedley A, Crawford T, Cloete L. An intervention aimed at reducing plagiarism in undergraduate nursing students. Nurse Educ Pract. 2015;15(3):168–73.
Article PubMed Google Scholar
Eret E, Gokmenoglu T. Plagiarism in higher education: a case study with prospective academicians. Proc Soc Behav Sci. 2010;2(2):3303–7.
Article Google Scholar
Shivakumar N, Garcia-Molina H. SCAM: a copy detection mechanism for digital documents. Stanford: Department of Computer Science, Stanford University: Austin, Texas 1995.
Si A, Leong HV, Lau RWH. Check: a document plagiarism detection system. In: Proceedings of the 1997 ACM symposium on applied computing. ACM: San Jose, California, USA 1997. pp. 70-7.
Balaguer EV. Putting ourselves in SME’s shoes: automatic detection of plagiarism by the WCopyFind tool. In: Proceedings of the 3rd PAN'09 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, Bauhaus University Weimar, 2009. pp. 34-5.
Chong MYM. A study on plagiarism detection and plagiarism direction identification using natural language processing techniques. Thesis report, England: University of Wolverhampton; 2013.
Barrón-Cedeño A, Gupta P, Rosso P. Methods for cross-language plagiarism detection. Knowl-Based Syst. 2013;50:211–7.
Article Google Scholar
Kent CK, Salim N. Web-based cross-language plagiarism detection. In: Computational Intelligence, Modelling and Simulation (CIMSiM), 2010 Second International Conference on. Sydney: IEEE: Indonesia 2010. pp. 199-204.
Potthast M, et al. Cross-language plagiarism detection. Lang Resour Eval. 2011;45(1):45–62.
Article Google Scholar
Menai MEB, Bagais M. APlag: a plagiarism checker for Arabic texts. In: Computer Science & Education (ICCSE), 2011 6th International Conference on. Singapore: IEEE: Singapore 2011. pp. 1379-83.
Butakov S, Scherbinin V. The toolbox for local and global plagiarism detection. Comput Educ. 2009;52(4):781–8.
Article Google Scholar
Jadalla A, Elnagar A. A plagiarism detection system for Arabic text-based documents. Pacific-Asia Workshop on Intelligence and Security Informatics. Heidelberg: Springer Berlin; 2012. pp. 145-53.
Li Y, Bandar ZA, McLean D. An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans Knowl Data Eng. 2003;15(4):871–82.
Article Google Scholar
Grozea C, Gehl C, Popescu M. ENCOPLOT: pairwise sequence matching in linear time applied to plagiarism detection. In: Stein B, Rosso P, Stamatatos E, Koppel M, Agirre E, editors. 25th Conference of the Spanish Society for NLP, SEPLN ‘09. Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 09). Donostia; 2009. p. 10–8.
Agarwal B, Poria S, Mittal N, Gelbukh A, Hussain A. Concept-level sentiment analysis with dependency-based semantic parsing: a novel approach. Cogn Comput. 2015;7(4):487–99.
Article Google Scholar
Zechner M, Muhr M, Kern R, Granitzer M. External and intrinsic plagiarism detection using vector space models. In: Stein B, Rosso P, Stamatatos E, Koppel M, Agirre E, editors. 25th Conference of the Spanish Society for NLP, SEPLN ‘09. Donostia; 2009. p. 47-55.
Basile C, Benedetto D, Caglioti E, Cristadoro G, Esposti MD. A plagiarism detection procedure in three steps: selection, matches and “squares”. In: Stein B, Rosso P, Stamatatos E, Koppel M, Agirre E, editors. 25th Conference of the Spanish Society for NLP, SEPLN ‘09. Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 09). Donostia; 2009. p. 19–23.
Kent CK, Salim N. Features based text similarity detection. Journal of Computing 2010. 2(1).
Hussein AS. Arabic document similarity analysis using n-grams and singular value decomposition. In: 2015 I.E. 9th International Conference on Research Challenges in Information Science (RCIS). Athens: IEEE, 2015. pp. 445-55
Chanceaux M, Guérin-Dugué A, Lemaire B, Baccino T. A computational cognitive model of information search in textual materials. Cogn Comput. 2014;6(1):1–17.
Article Google Scholar
Ekbal A, Saha S, Choudhary G. Plagiarism detection in text using vector space model. In: Hybrid Intelligent Systems (HIS), 12th International Conference on. Pune: IEEE: Pune, India 2012. pp. 366-71.
Lin C, Liu D, Pang W, Wang Z. Sherlock: a semi-automatic framework for quiz generation using a hybrid semantic similarity measure. Cogn Comput. 2015;7(6):667–79.
Article Google Scholar
Abdi A, et al. PDLK: plagiarism detection using linguistic knowledge. Expert Syst Appl. 2015;42(22):8936–46.
Article Google Scholar
Schleimer S, Wilkerson DS, Aiken A. Winnowing: local algorithms for document fingerprinting. Proceedings of the 2003 ACM SIGMOD international conference on Management of data. ACM: San Diego, California, USA 2003. pp. 76-85
Velásquez JD, et al. DOCODE 3.0 (DOcument COpy DEtector): a system for plagiarism detection by applying an information fusion process from multiple documental data sources. Inf Fusion. 2016;27:64–75.
Article Google Scholar
Sánchez-Vega F, et al. Determining and characterizing the reused text for plagiarism detection. Expert Syst Appl. 2013;40(5):1804–13.
Article Google Scholar
Osman AH, et al. An improved plagiarism detection scheme based on semantic role labeling. Appl Soft Comput. 2012;12(5):1493–502.
Article Google Scholar
Paul M, Jamal S. An improved SRL based plagiarism detection technique using sentence ranking. Proc Comput Sci. 2015;46:223–30.
Article Google Scholar
Osman AH, et al. Conceptual similarity and graph-based method for plagiarism detection. J Theor Appl Inf Technol. 2011;32(2):135–45.
Google Scholar
Alzahrani S, Salim N. Fuzzy semantic-based string similarity for extrinsic plagiarism detection. Braschler and Harman. 2010;1176:1-8.
Medin DL, Goldstone RL, Gentner D. Respects for similarity. Psychol Rev. 1993;100(2):254.
Article Google Scholar
Miller GA. WordNet: a lexical database for English. Commun ACM. 1995;38(11):39–41.
Article Google Scholar
Gene Ontology Consortium. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004;32(Suppl 1):D258–61.
Article Google Scholar
Altheide P. Spatial data transfer standard (sdts). In: Encyclopedia of GIS. Springer US: USA 2008. pp. 1087-95.
Li Y, et al. Sentence similarity based on semantic nets and corpus statistics. IEEE Trans Knowl Data Eng. 2006;18(8):1138–50.
Article Google Scholar
Rada R, e a. Development and application of a metric on semantic nets. IEEE Trans Syst Man Cybern. 1989;19(1):17–30.
Article Google Scholar
Wu Z, Palmer M. Verbs semantics and lexical selection. In: Proceedings of the 32nd annual meeting on Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics: Las Cruces, 1994. pp. 133–8.
Lin D. An information-theoretic definition of similarity. Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998). San Francisco: Morgan Kaufmann Publishers Inc; 1998. pp. 296–304.
Lennon M, et al. An evaluation of some conflation algorithms for information retrieval. J Inf Sci. 1981;3(4):177–83.
Google Scholar
Tomasic A, Garcia-Molina H. Query processing and inverted indices in shared: nothing text document information retrieval systems. VLDB J. 1993;2(3):243–76.
Article Google Scholar
Alzahrani SM, Salim N, Abraham A. Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE Trans Syst Man Cybern Part C Appl Rev. 2012;42(2):133-49.
Stamatatos E. Plagiarism detection using stopword n-grams. J Assoc Inf Sci Technol. 2011;62(12):2512-27.
Ahsaee MG, Naghibzadeh M, Ehsan Yasrebi Naeini S. Semantic similarity assessment of words using weighted WordNet. Int J Mach Learn Cybern. 2014;5(3):479–90.
Article Google Scholar
Wang S, Qi H, Kong L, Nu C. Combination of VSM and Jaccard coefficient for external plagiarism detection. In: 2013 International Conference on Machine Learning and Cybernetics, vol 4. Tianjin: IEEE: Tianjin, 2013. pp. 1880–85.
Ekbal A, Saha S, and Choudhary S. Plagiarism detection in text using vector space model. In: Hybrid Intelligent Systems (HIS), 2012 12th International Conference on. Pune: IEEE; 2012. pp. 366–71.
Grman J, Ravas R. Improved implementation for finding text similarities in large collections of data. Proc PAN at CLEF conference in Amsterdam, The Netherlands. 2011;4(4):339–365.

Download references

Author information

Authors and Affiliations

University Institute of Engineering and Technology, Panjab University, Chandigarh, India
Mansi Sahi & Vishal Gupta

Authors

Mansi Sahi
View author publications
You can also search for this author in PubMed Google Scholar
Vishal Gupta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vishal Gupta.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sahi, M., Gupta, V. A Novel Technique for Detecting Plagiarism in Documents Exploiting Information Sources. Cogn Comput 9, 852–867 (2017). https://doi.org/10.1007/s12559-017-9502-4

Download citation

Received: 05 September 2016
Accepted: 01 August 2017
Published: 22 August 2017
Issue Date: December 2017
DOI: https://doi.org/10.1007/s12559-017-9502-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Novel Technique for Detecting Plagiarism in Documents Exploiting Information Sources

Abstract

Access this article

Similar content being viewed by others

A New Hybrid Technique for Detection of Plagiarism from Text Documents

Survey on Plagiarism Detection Systems and Their Comparison

An Enhanced Plagiarism Detection Based on Syntactico-Semantic Knowledge

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Ethical Approval

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Novel Technique for Detecting Plagiarism in Documents Exploiting Information Sources

Abstract

Access this article

Similar content being viewed by others

A New Hybrid Technique for Detection of Plagiarism from Text Documents

Survey on Plagiarism Detection Systems and Their Comparison

An Enhanced Plagiarism Detection Based on Syntactico-Semantic Knowledge

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Ethical Approval

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation