Methods for cross-language plagiarism detection

被引:48
|
作者
Barron-Cedeno, Alberto [1 ,2 ]
Gupta, Parth [3 ]
Rosso, Paolo [3 ]
机构
[1] Univ Politecn Cataluna, Talp Res Ctr, E-08028 Barcelona, Spain
[2] Univ Politecn Madrid, Fac Informat, E-28040 Madrid, Spain
[3] Univ Politecn Valencia, NLE Lab ELiRF, Valencia, Spain
关键词
Automatic plagiarism detection; Cross-language plagiarism; Plagiarism detection architecture; Cross-language similarity; Text re-use analysis; RETRIEVAL;
D O I
10.1016/j.knosys.2013.06.018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Three reasons make plagiarism across languages to be on the rise: (i) speakers of under-resourced languages often consult documentation in a foreign language, (ii) people immersed in a foreign country can still consult material written in their native language, and (iii) people are often interested in writing in a language different to their native one. Most efforts for automatically detecting cross-language plagiarism depend on a preliminary translation, which is not always available. In this paper we propose a freely available architecture for plagiarism detection across languages covering the entire process: heuristic retrieval, detailed analysis, and post-processing. On top of this architecture we explore the suitability of three cross-language similarity estimation models: Cross-Language Alignment-based Similarity Analysis (CL-ASA), Cross-Language Character n-Grams (CL-CNG), and Translation plus Monolingual Analysis (T + MA); three inherently different models in nature and required resources. The three models are tested extensively under the same conditions on the different plagiarism detection sub-tasks something never done before. The experiments show that T + MA produces the best results, closely followed by CL-ASA. Still CL-ASA obtains higher values of precision, an important factor in plagiarism detection when lesser user intervention is desired. Crown Copyright (C) 2013 Published by Elsevier B.V. All rights reserved.
引用
收藏
页码:211 / 217
页数:7
相关论文
共 50 条
  • [1] Cross-language plagiarism detection
    Potthast, Martin
    Barron-Cedeno, Alberto
    Stein, Benno
    Rosso, Paolo
    LANGUAGE RESOURCES AND EVALUATION, 2011, 45 (01) : 45 - 62
  • [2] Meta-Analysis of Cross-Language Plagiarism and Self-Plagiarism Detection Methods for Russian-English Language Pair
    Tlitova, Alina
    Toschev, Alexander
    Talanov, Max
    Kurnosov, Vitaliy
    FRONTIERS IN COMPUTER SCIENCE, 2020, 2
  • [3] Cross-language text alignment: A proposed two-level matching scheme for plagiarism detection
    Roostaee, Meysam
    Fakhrahmad, Seyed Mostafa
    Sadreddini, Mohammad Hadi
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 160
  • [4] On the Mono-and Cross-Language Detection of Text Re-Use and Plagiarism
    Barron Cedeno, Alberto
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2013, (50): : 103 - 105
  • [5] An effective approach to candidate retrieval for cross-language plagiarism detection: A fusion of conceptual and keyword-based schemes
    Roostaee, Meysam
    Sadreddini, Mohammad Hadi
    Fakhrahmad, Seyed Mostafa
    INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (02)
  • [6] Cross-Language Automatic Plagiarism Detector Using Latent Semantic Analysis and Self-Organizing Map
    Ratna, Anak Agung Putri
    Nabhastala, Paskalis Nandana Yestha
    Ibrahim, Ihsan
    Ekadiyanto, F. Astha
    Salman, Muhammad
    Herusaktiawan, Muhammad Yusuf Irfan
    Purnamasari, Prima Dewi
    AIVR 2018: 2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND VIRTUAL REALITY, 2018, : 83 - 87
  • [7] Evaluating Cross-Language Explicit Semantic Analysis and Cross Querying
    Anderka, Maik
    Lipka, Nedim
    Stein, Benno
    MULTILINGUAL INFORMATION ACCESS EVALUATION I: TEXT RETRIEVAL EXPERIMENTS, 2010, 6241 : 50 - 57
  • [8] Cross-Language Similarity Modulates Effectiveness of Second Language Grammar Instruction
    Tolentino, Leida C.
    Tokowicz, Natasha
    LANGUAGE LEARNING, 2014, 64 (02) : 279 - 309
  • [9] A Short-Term Testing Effect in Cross-Language Recognition
    Verkoeijen, Peter P. J. L.
    Bouwmeester, Samantha
    Camp, Gino
    PSYCHOLOGICAL SCIENCE, 2012, 23 (06) : 567 - 571
  • [10] Cross-Language Activation Begins During Speech Planning and Extends Into Second Language Speech
    Jacobs, April
    Fricke, Melinda
    Kroll, Judith F.
    LANGUAGE LEARNING, 2016, 66 (02) : 324 - 353