Prevalence of nonsensical algorithmically generated papers in the scientific literature

被引:29
作者
Cabanac, Guillaume [1 ]
Labbe, Cyril [2 ]
机构
[1] Univ Toulouse, Comp Sci Dept, CNRS, IRIT,UMR 5505, 118 Route Narbonne, F-31062 Toulouse, France
[2] Univ Grenoble Alpes, CNRS, Grenoble INP, LIG, Grenoble, France
关键词
GOOGLE SCHOLAR;
D O I
10.1002/asi.24495
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In 2014 leading publishers withdrew more than 120 nonsensical publications automatically generated with the SCIgen program. Casual observations suggested that similar problematic papers are still published and sold, without follow-up retractions. No systematic screening has been performed and the prevalence of such nonsensical publications in the scientific literature is unknown. Our contribution is 2-fold. First, we designed a detector that combs the scientific literature for grammar-based computer-generated papers. Applied to SCIgen, it has a 83.6% precision. Second, we performed a scientometric study of the 243 detected SCIgen-papers from 19 publishers. We estimate the prevalence of SCIgen-papers to be 75 per million papers in Information and Computing Sciences. Only 19% of the 243 problematic papers were dealt with: formal retraction (12) or silent removal (34). Publishers still serve and sometimes sell the remaining 197 papers without any caveat. We found evidence of citation manipulation via edited SCIgen bibliographies. This work reveals metric gaming up to the point of absurdity: fraudsters publish nonsensical algorithmically generated papers featuring genuine references. It stresses the need to screen papers for nonsense before peer-review and chase citation manipulation in published papers. Overall, this is yet another illustration of the harmful effects of the pressure to publish or perish.
引用
收藏
页码:1461 / 1476
页数:16
相关论文
共 37 条
[1]   Comparing the topological properties of real and artificially generated scientific manuscripts [J].
Amancio, Diego Raphael .
SCIENTOMETRICS, 2015, 105 (03) :1763-1779
[2]  
[Anonymous], 2010, ALGORITHMIC DETECTIO
[3]  
[Anonymous], 2005, ACM T STORAGE, DOI DOI 10.1145/1044956.1044958
[4]  
Antkare I, 2020, INFRASTRUCT SER, P177
[5]   Computer conference welcomes gobbledegook paper [J].
Ball, P .
NATURE, 2005, 434 (7036) :946-946
[6]  
Barbour B, 2020, INFRASTRUCT SER, P149
[7]   Hoax-detecting software spots fake papers [J].
Bohannon, John .
SCIENCE, 2015, 348 (6230) :18-19
[8]  
Bulhak A.C., 1996, 96264 MON U DEP COMP
[9]   Bibliogifts in LibGen? A study of a text-sharing platform driven by biblioleaks and crowdsourcing [J].
Cabanac, Guillaume .
JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2016, 67 (04) :874-884
[10]  
Chawla D S., 2017, Science, DOI 10.1126/science.aar4464