Word Embedding for High Performance Cross-Language Plagiarism Detection Techniques

被引:0
作者
Bouaine C. [1 ]
Benabbou F. [1 ]
Sadgali I. [1 ]
机构
[1] Laboratory of Modeling and Information Technology, University Hassan II, Casablanca
关键词
BERT; cross-language; Doc2Vec; FastText; GloVe; plagiarism; Sen2Vec; SLSTM; Word2Vec;
D O I
10.3991/ijim.v17i10.38891
中图分类号
学科分类号
摘要
Academic plagiarism has become a serious concern as it leads to the retardation of scientific progress and violation of intellectual property. In this context, we make a study aiming at the detection of cross-linguistic plagiarism based on Natural language Preprocessing (NLP), Embedding Techniques, and Deep Learning. Many systems have been developed to tackle this problem, and many rely on machine learning and deep learning methods. In this paper, we propose Cross-language Plagiarism Detection (CL-PD) method based on Doc2Vec embedding techniques and a Siamese Long Short-Term Memory (SLSTM) model. Embedding techniques help capture the text's contextual meaning and improve the CL-PD system's performance. To show the effectiveness of our method, we conducted a comparative study with other techniques such as GloVe, FastText, BERT, and Sen2Vec on a dataset combining PAN11, JRC-Acquis, Europarl, and Wikipedia. The experiments for the Spanish-English language pair show that Doc2Vec+SLSTM achieve the best results compared to other relevant models, with an accuracy of 99.81%, a precision of 99.75%, a recall of 99.88%, an f-score of 99.70%, and a very small loss in the test phase. © 2023,International Journal of Interactive Mobile Technologies. All Rights Reserved.
引用
收藏
页码:69 / 91
页数:22
相关论文
共 52 条
[1]  
Wager E., Defining and responding to plagiarism, Learn. Publ, 27, 1, pp. 33-42, (2014)
[2]  
Son N., Le H., Nguyen C. T., A two-phase plagiarism detection system based on multilayer LSTM networks, IAES Int. J. Artif. Intell. IJ-AI, vofol, 10, pp. 636-648, (2021)
[3]  
Comas-Forgas R., Sureda-Negre J., Academic Plagiarism: Explanatory Factors from Students’ Perspective, J. Acad. Ethics, 8, 3, pp. 217-232, (2010)
[4]  
Husain F. M., Al-Shaibani G. K. S., Mahfoodh O. H. A., Perceptions of and Attitudes toward Plagiarism and Factors Contributing to Plagiarism: a Review of Studies, Journal of Academic Ethics, 15, 2, pp. 167-195, (2017)
[5]  
Foltynek T., Meuschke N., Gipp B., Academic Plagiarism Detection: ACM Computing Surveys, 52, 6, pp. 1-42, (2020)
[6]  
Berlinck R. G. S., The academic plagiarism and its punishments - a review, Rev. Bras. Farmacogn, 21, 3, pp. 365-372, (2011)
[7]  
Naik R. R., Landge M. B., Mahender C. N., A review on plagiarism detection tools, Int. J. Comput. Appl, 125, 11, (2015)
[8]  
Sabeeh M., Khaled F., Plagiarism Detection Methods and Tools: An Overview, Iraqi Journal of Science, pp. 2771-2783, (2021)
[9]  
Bechhoefer J., Plagiarism: text-matching program offers an answer, Nature, 449, 7163, pp. 658-658, (2007)
[10]  
Weber-Wulff D., Plagiarism Detection Software: Promises, Pitfalls, and Practices, Handbook of Academic Integrity, pp. 625-638, (2016)