Exploiting the untapped functional potential of Memento aggregators beyond aggregation

被引:0
作者
Kelly, Mat [1 ]
机构
[1] Drexel Univ, Coll Comp & Informat, Dept Informat Sci, Philadelphia, PA 19104 USA
关键词
Web archives; Memento; Aggregator; Implementation; Optimization;
D O I
10.1007/s00799-023-00391-0
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
Web archives capture, retain, and present historical versions of web pages. Viewing web archives often amounts to a user visiting the Wayback Machine homepage, typing in a URL, then choosing a date and time significant of the capture. Other web archives also capture the web and use Memento as an interoperable point of querying their captures. Memento aggregators are web accessible software packages that allow clients to send requests for past web pages to a single endpoint source that then relays that request to a set of web archives. Though few deployed aggregator instances exist that exhibit this aggregation trait, they all, for the most part, align to a model of serving a request for a URI of an original resource (URI-R) to a client by first querying then aggregating the results of the responses from a collection of web archives. This single tier querying need not be the logical flow of an aggregator, so long as a user can still utilize the aggregator from a single URL. In this paper, we discuss theoretical aggregation models of web archives. We first describe the status quo as the conventional behavior exhibited by an aggregator. We then build on prior work to describe a multi-tiered, structured querying model that may be exhibited by an aggregator. We highlight some potential issues and high-level optimization to ensure efficient aggregation while also extending on the state-of-the-art of memento aggregation. Part of our contribution is the extension of an open-source, user-deployable Memento aggregator to exhibit the capability described in this paper. We also extend a browser extension that typically consults an aggregator to have the ability to aggregate itself rather than needing to consult an external service. A purely client-side, browser-based Memento aggregator is novel to this work.
引用
收藏
页码:93 / 104
页数:12
相关论文
共 42 条
[1]   Profiling Web Archival Voids for Memento Routing [J].
Alam, Sawood ;
Weigle, Michele C. ;
Nelson, Michael L. .
2021 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL 2021), 2021, :150-159
[2]   MementoMap Framework for Flexible and Adaptive Web Archive Profiling [J].
Alam, Sawood ;
Weigle, Michele C. ;
Nelson, Michael L. ;
Melo, Fernando ;
Bicho, Daniel ;
Gomes, Daniel .
2019 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL 2019), 2019, :172-181
[3]   Web archive profiling through CDX summarization [J].
Alam, Sawood ;
Nelson, Michael L. ;
Van de Sompel, Herbert ;
Balakireva, Lyudmila L. ;
Shankar, Harihar ;
Rosenthal, David S. H. .
INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES, 2016, 17 (03) :223-238
[4]   MemGator - A Portable Concurrent Memento Aggregator [J].
Alam, Sawood ;
Nelson, Michael L. .
2016 IEEE/ACM JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL), 2016, :243-244
[5]   Comparing the Archival Rate of Arabic, English, Danish, and Korean Language Web Pages [J].
Alkwai, Lulwah M. ;
Nelson, Michael L. ;
Weigle, Michele C. .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2017, 36 (01)
[6]   Profiling web archive coverage for top-level domain and content language [J].
Alsum, Ahmed ;
Weigle, Michele C. ;
Nelson, Michael L. ;
Van de Sompel, Herbert .
INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES, 2014, 14 (3-4) :149-166
[7]   Where Did the Web Archive Go? [J].
Aturba, Mohamed ;
Nelson, Michael L. ;
Weigle, Michele C. .
LINKING THEORY AND PRACTICE OF DIGITAL LIBRARIES, TPDL 2021, 2021, 12866 :73-84
[8]  
Berners-Lee T., 2005, IETF RFC 3986
[9]   Routing Memento Requests Using Binary Classifiers [J].
Bornand, Nicolas J. ;
Balakireva, Lyudmila ;
Van de Sompel, Herbert .
2016 IEEE/ACM JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL), 2016, :63-72
[10]  
Bragg H., 2023, ARXIV