Abstract
The implementation of internet applications has already crossed the language border. It has, for sure, brought lots of advantages, but to some extent has also introduced some side-effect. One of the negative effects of using these applications is cross-languages plagiarism, which is also known as translated plagiarism.
In academic institutions, translated plagiarism can be found in various cases, such as: final project, theses, papers, and so forth. In this paper, a model for web-based early detection system for translated plagiarism is proposed and a prototype is developed. The system works by translating the input document (written in Bahasa Indonesian) into English using Google Translate API components, and then search for documents on the World Wide Web repository which have similar contents to the translated document. If found, the system downloads these documents and then do some preprocessing steps such as: removing punctuations, numbers, stop words, repeated words, lemmatization of words, and the final process is to compare the content of both documents using the modified sentence-based detection algorithm (SBDA). The results show that the proposed method has smaller error rate leading to conclusion that it has better accuracy.
Chapter PDF
Similar content being viewed by others
References
Maurer, H., Kappe, F., Zaka, B.: Plagiarism - a survey. Journal of Universal Computer Science 12(8), 1050–1084 (2006)
Kent, C.K., Salim, N.: Web based cross language plagiarism detection. Journal of Computing 1(1) (2009)
White, D.R., Joy, M.S.: Sentence-based natural language plagiarism detection. Journal on Educational Resources in Computing 4(4) (2004)
Broder, A.Z.: On the resemblance and containment of documents. In: Compression and Complexity of Sequences, SEQUENCES 1997, pp. 21–29. IEEE Computer Society (1997)
Monostori, K., Finkel, R., Zaslavsky, A., Hodász, G., Pataki, M.: Comparison of Overlap Detection Techniques. In: Sloot, P.M.A., Tan, C.J.K., Dongarra, J., Hoekstra, A.G. (eds.) ICCS-ComputSci 2002, Part I. LNCS, vol. 2329, pp. 51–60. Springer, Heidelberg (2002)
Yerra, R.: Detecting similar html documents using a sentence-based copy detection approach. Master’s thesis, Brigham Young University (2005)
Smith, R.D.: Copy detection systems for digital documents. Master’s thesis, Brigham Young University (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mustofa, K., Sir, Y.A. (2013). Early-Detection System for Cross-Language (Translated) Plagiarism. In: Mustofa, K., Neuhold, E.J., Tjoa, A.M., Weippl, E., You, I. (eds) Information and Communication Technology. ICT-EurAsia 2013. Lecture Notes in Computer Science, vol 7804. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36818-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-36818-9_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36817-2
Online ISBN: 978-3-642-36818-9
eBook Packages: Computer ScienceComputer Science (R0)