Developing a corpus of plagiarised short answers

被引:65
作者
Clough, Paul [1 ]
Stevenson, Mark [2 ]
机构
[1] Univ Sheffield, Dept Informat Studies, Sheffield S1 4DP, S Yorkshire, England
[2] Univ Sheffield, Dept Comp Sci, Sheffield S1 4DP, S Yorkshire, England
关键词
Plagiarism; Plagiarism detection; Corpus creation; Language resources; PARAPHRASE;
D O I
10.1007/s10579-009-9112-1
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Plagiarism is widely acknowledged to be a significant and increasing problem for higher education institutions (McCabe 2005; Judge 2008). A wide range of solutions, including several commercial systems, have been proposed to assist the educator in the task of identifying plagiarised work, or even to detect them automatically. Direct comparison of these systems is made difficult by the problems in obtaining genuine examples of plagiarised student work. We describe our initial experiences with constructing a corpus consisting of answers to short questions in which plagiarism has been simulated. This corpus is designed to represent types of plagiarism that are not included in existing corpora and will be a useful addition to the set of resources available for the evaluation of plagiarism detection systems.
引用
收藏
页码:5 / 24
页数:20
相关论文
共 52 条
[1]  
[Anonymous], TR19975 U GLASG DEP
[2]  
[Anonymous], 2004, ACE 04 P 6 AUSTR C C
[3]  
[Anonymous], 2004, P INT C COMP LING
[4]  
[Anonymous], COMPUT HIGHER ED EC
[5]  
[Anonymous], 1994, Journal of Information Ethics
[6]  
[Anonymous], 2000, Plagiarism in natural and programming languages: an overview of current tools and technologies
[7]  
Barzilay R, 2001, 39TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P50
[8]  
BRIN S, 1995, P 1995 ACM SIGMOD IN, P398
[9]   On the resemblance and containment of documents [J].
Broder, AZ .
COMPRESSION AND COMPLEXITY OF SEQUENCES 1997 - PROCEEDINGS, 1998, :21-29
[10]  
Bull J., 2001, TECHNICAL REV PLAGIA