Computer-Generated Text Detection Using Machine Learning: A Systematic Review

被引:17
作者
Beresneva, Daria [1 ]
机构
[1] Russian Acad Natl Econ & Publ Adm, Moscow Inst Phys & Technol, Antiplagiat Res, Moscow, Russia
来源
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, NLDB 2016 | 2016年 / 9612卷
关键词
Artificial content; Generated text; Fake content detection;
D O I
10.1007/978-3-319-41754-7_43
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Computer-generated text or artificial text nowadays is in abundance on the web, ranging from basic random word salads to web scraping. In this paper, we present a short version of systematic review of some existing automated methods aimed at distinguishing natural texts from artificially generated ones. The methods were chosen by certain criteria. We further provide a summary of the methods considered. Comparisons, whenever possible, use common evaluation measures, and control for differences in experimental set-up.
引用
收藏
页码:421 / 426
页数:6
相关论文
共 19 条
[11]  
Labbe C., 2012, SCIENTOMETRICS, P10
[12]  
Lavergne T., 2008, PAN 2008
[13]  
Manning C., 1999, FDN STAT NATURAL LAN
[14]  
Seymore K, 1996, ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, P232, DOI 10.1109/ICSLP.1996.607084
[15]   DISTRIBUTION LAW FOR WORD FREQUENCIES [J].
SICHEL, HS .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1975, 70 (351) :542-547
[16]  
Stolcke A., 1998, ENTROPY BASED PRUNIN
[17]  
Urvoy T., 2006, AIRWEB 2006
[18]  
Vapnik V., 1999, The nature of statistical learning theory
[19]  
Witten IH, 2011, MOR KAUF D, P1