An IR-Based Approach Utilizing Query Expansion for Plagiarism Detection in MEDLINE

被引:8
作者
Nawab, Rao Muhammad Adeel [1 ]
Stevenson, Mark [2 ]
Clough, Paul [3 ]
机构
[1] COMSATS Inst Informat Technol, Dept Comp Sci, Def Rd,Raiwind Rd, Lahore, Pakistan
[2] Univ Sheffield, Dept Comp Sci, Nat Language Proc Grp, 211 Portobello, Sheffield S1 4DP, S Yorkshire, England
[3] Univ Sheffield, Informat Sch, 211 Portobello, Sheffield S1 4DP, S Yorkshire, England
关键词
Natural language processing; information retrieval; extrinsic plagiarismdetection; medline; umlsmetathesaurus; query expansion;
D O I
10.1109/TCBB.2016.2542803
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The identification of duplicated and plagiarized passages of text has become an increasingly active area of research. In this paper, we investigate methods for plagiarism detection that aim to identify potential sources of plagiarism from MEDLINE, particularly when the original text has been modified through the replacement of words or phrases. A scalable approach based on Information Retrieval is used to perform candidate document selection-the identification of a subset of potential source documents given a suspicious text-from MEDLINE. Query expansion is performed using the ULMS Metathesaurus to deal with situations in which original documents are obfuscated. Various approaches to Word Sense Disambiguation are investigated to deal with cases where there are multiple Concept Unique Identifiers (CUIs) for a given term. Results using the proposed IR-based approach outperform a state-of-the-art baseline based on Kullback-Leibler Distance.
引用
收藏
页码:796 / 804
页数:9
相关论文
共 55 条
[1]   Using Structural Information and Citation Evidence to Detect Significant Plagiarism Cases in Scientific Publications [J].
Alzahrani, Salha ;
Palade, Vasile ;
Salim, Naomie ;
Abraham, Ajith .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2012, 63 (02) :286-312
[2]  
[Anonymous], J COMPUTING
[3]  
[Anonymous], COMPUT HIGHER ED EC
[4]  
[Anonymous], 2009, P AM SOC INFORM SCI
[5]  
[Anonymous], 1994, Journal of Information Ethics
[6]   An overview of MetaMap: historical perspective and recent advances [J].
Aronson, Alan R. ;
Lang, Francois-Michel .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2010, 17 (03) :229-236
[7]   Plagiarism Meets Paraphrasing: Insights for the Next Generation in Automatic Plagiarism Detection [J].
Barron-Cedeno, Alberto ;
Vila, Marta ;
Antonia Marti, M. ;
Rosso, Paolo .
COMPUTATIONAL LINGUISTICS, 2013, 39 (04) :917-948
[8]  
Barrón-Cedeño A, 2009, LECT NOTES COMPUT SC, V5449, P523, DOI 10.1007/978-3-642-00382-0_42
[9]   Plagiarism on the rise [J].
Boisvert, RF ;
Irwin, MJ .
COMMUNICATIONS OF THE ACM, 2006, 49 (06) :23-24
[10]  
Cambell C., 1990, Second language writing: Research insights for the classroom, P211, DOI DOI 10.1017/CBO9781139524551.018