The Archive Query Log: Mining Millions of Search Result Pages of Hundreds of Search Engines from 25 Years of Web Archives

被引:3
作者
Reimer, Jan Heinrich [1 ]
Schmidt, Sebastian [2 ]
Froebe, Maik [1 ]
Gienapp, Lukas [2 ,3 ]
Scells, Harrisen [2 ]
Stein, Benno [4 ]
Hagen, Matthias [1 ]
Potthast, Martin [2 ,3 ]
机构
[1] Friedrich Schiller Univ Jena, Jena, Germany
[2] Univ Leipzig, Leipzig, Germany
[3] ScaDS AI, Leipzig, Germany
[4] Bauhaus Univ Weimar, Weimar, Germany
来源
PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023 | 2023年
关键词
query log; search engine result page; information retrieval history; INFORMATION; USERS; LIFE;
D O I
10.1145/3539618.3591890
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The Archive Query Log (AQL) is a previously unused, comprehensive query log collected at the Internet Archive over the last 25 years. Its first version includes 356 million queries, 137 million search result pages, and 1.4 billion search results across 550 search providers. Although many query logs have been studied in the literature, the search providers that own them generally do not publish their logs to protect user privacy and vital business data. Of the few query logs publicly available, none combines size, scope, and diversity. The AQL is the first to do so, enabling research on new retrieval models and (diachronic) search engine analyses. Provided in a privacy-preserving manner, it promotes open research as well as more transparency and accountability in the search industry.
引用
收藏
页码:2848 / 2860
页数:13
相关论文
共 118 条
[1]  
Agichtein E., 2006, Proceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P19, DOI 10.1145/1148170.1148177
[2]  
Agosti M., 2007, Proceedings of the 10th DELOS Thematic Workshop on Personalized Access, Profile Management, and Context Awareness in Digital Libraries. 10th DELOS Thematic Workshop on Personalized Access, Profile Management, P70
[3]   Web log analysis: a review of a decade of studies about information acquisition, inspection and interpretation of user interaction [J].
Agosti, Maristella ;
Crivellari, Franco ;
Di Nunzio, Giorgio Maria .
DATA MINING AND KNOWLEDGE DISCOVERY, 2012, 24 (03) :663-696
[4]  
Ahmad Farooq, 2005, P HUMAN LANGUAGE TEC, P955
[5]  
Allan J., 2008, NIST SPECIAL PUBLICA
[6]  
Allan James, 2007, NIST SPECIAL PUBLICA, V500- 274
[7]  
[Anonymous], 2022, OJ L, V65, P1
[8]  
[Anonymous], 2002, P INT C KNOWL DISC D
[9]  
[Anonymous], 2002, P 11 INT C WORLD WID
[10]  
Arampatzis Avi, 2007, LATECH 2007, P73