InterPlanetary Wayback: Peer-To-Peer Permanence of Web Archives

被引:12
作者
Kelly, Mat [1 ]
Alam, Sawood [1 ]
Nelson, Michael L. [1 ]
Weigle, Michele C. [1 ]
机构
[1] Old Dominion Univ, Dept Comp Sci, Norfolk, VA 23529 USA
来源
RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, TPDL 2016 | 2016年 / 9819卷
基金
美国国家科学基金会;
关键词
Web archives; Memento; Peer-to-peer; IPFS;
D O I
10.1007/978-3-319-43997-6_35
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We have integrated Web ARChive (WARC) files with the peer-to-peer content addressable InterPlanetary File System (IPFS) to allow the payload content of web archives to be easily propagated. We also provide an archival replay system extended from pywb to fetch the WARC content from IPFS and re-assemble the originally archived HTTP responses for replay. From a 1.0GB sample Archive-It collection of WARCs containing 21,994 mementos, we show that extracting and indexing the HTTP response content of WARCs containing IPFS lookup hashes takes 66.6min inclusive of dissemination into IPFS.
引用
收藏
页码:411 / 416
页数:6
相关论文
共 9 条
  • [1] Alam Sawood, 2015, CDXJ OBJECT RESOURCE
  • [2] [Anonymous], 2014, 7230 IETF RFC
  • [3] Benet J., 2014, TECHNICAL REPORT
  • [4] The LOCKSS peer-to-peer digital preservation system
    Maniatis, P
    Roussopoulos, M
    Giuli, TJ
    Rosenthal, DSH
    Baker, M
    [J]. ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2005, 23 (01): : 2 - 50
  • [5] Moats R., 1997, 2141 IETF RFC
  • [6] Mohr Gordon, 2004, P 4 INT WEB ARCH WOR
  • [7] Nelson Michael, 2013, 7089 IETF RFC
  • [8] of Congress Library, 2009, WARC WEB ARCHIVE FIL
  • [9] Sigurdsson K, 2006, P 6 INT WEB ARCH WOR