Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot

被引:91
作者
Klein, Martin [1 ]
Van de Sompel, Herbert [1 ]
Sanderson, Robert [1 ]
Shankar, Harihar [1 ]
Balakireva, Lyudmila [1 ]
Zhou, Ke [2 ]
Tobin, Richard [2 ]
机构
[1] Los Alamos Natl Lab, Digital Lib Res & Prototyping Team, Res Lib, Los Alamos, NM 87545 USA
[2] Univ Edinburgh, Language Technol Grp, Edinburgh, Midlothian, Scotland
基金
美国安德鲁·梅隆基金会;
关键词
INTERNET REFERENCES; WEB; PERSISTENCE; DECAY; JOURNALS; GONE; TIME;
D O I
10.1371/journal.pone.0115253
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The emergence of the web has fundamentally affected most aspects of information communication, including scholarly communication. The immediacy that characterizes publishing information to the web, as well as accessing it, allows for a dramatic increase in the speed of dissemination of scholarly knowledge. But, the transition from a paper-based to a web-based scholarly communication system also poses challenges. In this paper, we focus on reference rot, the combination of link rot and content drift to which references to web resources included in Science, Technology, and Medicine (STM) articles are subject. We investigate the extent to which reference rot impacts the ability to revisit the web context that surrounds STM articles some time after their publication. We do so on the basis of a vast collection of articles from three corpora that span publication years 1997 to 2012. For over one million references to web resources extracted from over 3.5 million articles, we determine whether the HTTP URI is still responsive on the live web and whether web archives contain an archived snapshot representative of the state the referenced resource had at the time it was referenced. We observe that the fraction of articles containing references to web resources is growing steadily over time. We find one out of five STM articles suffering from reference rot, meaning it is impossible to revisit the web context that surrounds them some time after their publication. When only considering STM articles that contain references to web resources, this fraction increases to seven out of ten. We suggest that, in order to safeguard the long-term integrity of the web-based scholarly record, robust solutions to combat the reference rot problem are required. In conclusion, we provide a brief insight into the directions that are explored with this regard in the context of the Hiberlink project.
引用
收藏
页数:39
相关论文
共 48 条
[1]  
Adar E., 2009, P 2 INT C WEB SEARCH, P282, DOI 10.1145/1498759.1498837
[2]  
Adar E, 2008, UIST 2008: PROCEEDINGS OF THE 21ST ANNUAL ACM SYMPOSIUM ON USER INTERFACE SOFTWARE AND TECHNOLOGY, P239, DOI 10.1145/1449715.1449756
[3]   Profiling web archive coverage for top-level domain and content language [J].
Alsum, Ahmed ;
Weigle, Michele C. ;
Nelson, Michael L. ;
Van de Sompel, Herbert .
INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES, 2014, 14 (3-4) :149-166
[4]  
[Anonymous], 2014, MEMENTO TIME TRAVEL
[5]  
[Anonymous], 2014, NIHMS STAT
[6]  
[Anonymous], 2014, LINK ROT
[7]  
[Anonymous], 2014, SCI ENG IND 2014
[8]  
[Anonymous], 2014, BROKEN WORDPRESS LIN
[9]  
[Anonymous], 2009, NIH OPEN ACCESS MAND
[10]  
[Anonymous], 2012, STM REPORT OVERVIEW