Routing Memento Requests Using Binary Classifiers

被引:11
作者
Bornand, Nicolas J. [1 ]
Balakireva, Lyudmila [1 ]
Van de Sompel, Herbert [1 ]
机构
[1] Los Alamos Natl Lab, Los Alamos, NM 87544 USA
来源
2016 IEEE/ACM JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL) | 2016年
关键词
D O I
10.1145/2910896.2910899
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The Memento protocol provides a uniform approach to query individual web archives. Soon after its emergence, Memento Aggregator infrastructure was introduced that supports querying across multiple archives simultaneously. An Aggregator generates a response by issuing the respective Memento request against each of the distributed archives it covers. As the number of archives grows, it becomes increasingly challenging to deliver aggregate responses while keeping response times and computational costs under control. Ad-hoc heuristic approaches have been introduced to address this challenge and research has been conducted aimed at optimizing query routing based on archive profiles. In this paper, we explore the use of binary, archive-specific classifiers generated on the basis of the content cached by an Aggregator, to determine whether or not to query an archive for a given URI. Our results turn out to be readily applicable and can help to significantly decrease both the number of requests and the overall response times without compromising on recall. We find, among others, that classifiers can reduce the average number of requests by 77% compared to a brute force approach on all archives, and the overall response time by 42% while maintaining a recall of 0.847.
引用
收藏
页码:63 / 72
页数:10
相关论文
共 17 条
[1]  
Abramson M., 2012, WS1209 AAAI NAV RES
[2]   Web Archive Profiling Through CDX Summarization [J].
Alam, Sawood ;
Nelson, Michael L. ;
Van de Sompel, Herbert ;
Balakireva, Lyudmila L. ;
Shankar, Harihar ;
Rosenthal, David S. H. .
RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, 2015, 9316 :3-14
[3]   Profiling web archive coverage for top-level domain and content language [J].
Alsum, Ahmed ;
Weigle, Michele C. ;
Nelson, Michael L. ;
Van de Sompel, Herbert .
INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES, 2014, 14 (3-4) :149-166
[4]  
[Anonymous], 2013, ACM T WEB, V7
[5]  
[Anonymous], 2009, P 26 ANN INT C MACH, DOI DOI 10.1145/1553374.1553462
[6]  
Basnet R., 2014, Int. J. Res. Eng. Technol., V3, P11
[7]   A Comprehensive Study of Features and Algorithms for URL-Based Topic Classification [J].
Baykan, Eda ;
Henzinger, Monika ;
Marian, Ludmila ;
Weber, Ingmar .
ACM TRANSACTIONS ON THE WEB, 2011, 5 (03)
[8]   Web Page Language Identification Based on URLs [J].
Baykan, Eda ;
Henzinger, Monika ;
Weber, Ingmar .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2008, 1 (01) :176-187
[9]  
Breiman F, 1984, OLSHEN STONE CLASSIF
[10]  
Brunelle JF, 2013, ACM-IEEE J CONF DIG, P267