Methods for cross-language plagiarism detection

被引:48
|
作者
Barron-Cedeno, Alberto [1 ,2 ]
Gupta, Parth [3 ]
Rosso, Paolo [3 ]
机构
[1] Univ Politecn Cataluna, Talp Res Ctr, E-08028 Barcelona, Spain
[2] Univ Politecn Madrid, Fac Informat, E-28040 Madrid, Spain
[3] Univ Politecn Valencia, NLE Lab ELiRF, Valencia, Spain
关键词
Automatic plagiarism detection; Cross-language plagiarism; Plagiarism detection architecture; Cross-language similarity; Text re-use analysis; RETRIEVAL;
D O I
10.1016/j.knosys.2013.06.018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Three reasons make plagiarism across languages to be on the rise: (i) speakers of under-resourced languages often consult documentation in a foreign language, (ii) people immersed in a foreign country can still consult material written in their native language, and (iii) people are often interested in writing in a language different to their native one. Most efforts for automatically detecting cross-language plagiarism depend on a preliminary translation, which is not always available. In this paper we propose a freely available architecture for plagiarism detection across languages covering the entire process: heuristic retrieval, detailed analysis, and post-processing. On top of this architecture we explore the suitability of three cross-language similarity estimation models: Cross-Language Alignment-based Similarity Analysis (CL-ASA), Cross-Language Character n-Grams (CL-CNG), and Translation plus Monolingual Analysis (T + MA); three inherently different models in nature and required resources. The three models are tested extensively under the same conditions on the different plagiarism detection sub-tasks something never done before. The experiments show that T + MA produces the best results, closely followed by CL-ASA. Still CL-ASA obtains higher values of precision, an important factor in plagiarism detection when lesser user intervention is desired. Crown Copyright (C) 2013 Published by Elsevier B.V. All rights reserved.
引用
收藏
页码:211 / 217
页数:7
相关论文
共 50 条
  • [41] Exploring potential of different X-ray imaging methods for early-stage lung cancer detection
    Li, Kun
    Chen, Yu
    Sun, Rui
    Yu, Bei
    Li, Gang
    Jiang, Xiaoming
    RADIATION DETECTION TECHNOLOGY AND METHODS, 2020, 4 (02) : 213 - 221
  • [42] Enabling 5G: sentimental image dominant graph topic model for cross-modality topic detection
    Sun, Jiayi
    Li, Liang
    Li, Wenchao
    Zhang, Jiyong
    Yan, Chenggang
    WIRELESS NETWORKS, 2020, 26 (03) : 1549 - 1561
  • [43] Single- and Cross-Modality Near Duplicate Image Pairs Detection via Spatial Transformer Comparing CNN
    Zhang, Yi
    Zhang, Shizhou
    Li, Ying
    Zhang, Yanning
    SENSORS, 2021, 21 (01) : 1 - 22
  • [44] Efficient segmentation-based methods for anomaly detection in static and streaming time series under dynamic time warping
    Huynh Thi Thu Thuy
    Duong Tuan Anh
    Vo Thi Ngoc Chau
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2021, 56 (01) : 121 - 146
  • [45] Machine learning-based methods for sea surface rainfall detection from CYGNSS delay-doppler maps
    Bu, Jinwei
    Yu, Kegen
    Ni, Jun
    Yan, Qingyun
    Han, Shuai
    Wang, Jin
    Wang, Changyang
    GPS SOLUTIONS, 2022, 26 (04)
  • [46] Training-free retrieval-based log anomaly detection with pre-trained language model considering token-level information
    No, Gunho
    Lee, Yukyung
    Kang, Hyeongwon
    Kang, Pilsung
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [47] Bert-QAnet: BERT-encoded hierarchical question-answer cross-attention network for duplicate question detection
    Zhao, Xuan
    Huang, Jimmy Xiangji
    NEUROCOMPUTING, 2022, 509 : 68 - 74
  • [48] Assessment of Regression and Classification Methods Using Remote Sensing Technology for Detection of Coastal Depth (Case Study of Bushehr Port and Kharg Island)
    Moeinkhah, Ali
    Shakiba, Alireza
    Azarakhsh, Zeinab
    JOURNAL OF THE INDIAN SOCIETY OF REMOTE SENSING, 2019, 47 (06) : 1019 - 1029
  • [49] Exploring Efficient Methods for Using Multiple Spectral Reflectance Indices to Establish a Prediction Model for Early Drought Stress Detection in Greenhouse Tomato
    Fang, Shih-Lun
    Cheng, Yu-Jung
    Tu, Yuan-Kai
    Yao, Min-Hwi
    Kuo, Bo-Jein
    Cogato, Alessia
    Sozzi, Marco
    Laroche-Pinel, Eve
    Nikolic, Nebojsa
    HORTICULTURAE, 2023, 9 (12)
  • [50] Evaluation of passive microwave melt detection methods on Antarctic Peninsula ice shelves using time series of Sentinel-1 SAR
    Johnson, Andrew
    Fahnestock, Mark
    Hock, Regine
    REMOTE SENSING OF ENVIRONMENT, 2020, 250