TianGong-ST: A New Dataset with Large-scale Refined Real-world Web Search Sessions

被引:35
作者
Chen, Jia [1 ]
Mao, Jiaxin [1 ]
Liu, Yiqun [1 ]
Zhang, Min [1 ]
Ma, Shaoping [1 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Inst Artificial Intelligence, Beijing Natl Res Ctr Informat Sci & Technol, Beijing 100084, Peoples R China
来源
PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19) | 2019年
关键词
Test collection; Session search; Information Retrieval;
D O I
10.1145/3357384.3358158
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Web search session data is precious for a wide range of Information Retrieval (IR) tasks, such as session search, query suggestion, click through rate (CTR) prediction and so on. Numerous studies have shown the great potential of considering context information for search system optimization. The well-known TREC Session Tracks have enhanced the development in this domain to a great extent. However, they are mainly collected via user studies or crowd-sourcing experiments and normally contain only tens to thousands sessions, which are deficient for the investigation with more sophisticated models. To tackle this obstacle, we present a new dataset that contains 147,155 refined web search sessions with both click-based and human-annotated relevance labels. The sessions are sampled from a huge search log thus can reflect real search scenarios. The proposed dataset can support a wide range of session-level or task-based IR studies. As an example, we test several interactive search models with both the PSCM and human relevance labels provided by this dataset and report the performance as a reference for future studies of session search.
引用
收藏
页码:2485 / 2488
页数:4
相关论文
共 16 条
[1]  
[Anonymous], 1996, Technical report
[2]   Stratified analysis of AOL query log [J].
Brenes, David J. ;
Gayo-Avello, Daniel .
INFORMATION SCIENCES, 2009, 179 (12) :1844-1858
[3]   Evaluating Retrieval over Sessions: The TREC Session Track 2011-2014 [J].
Carterette, Ben ;
Clough, Paul ;
Hall, Mark ;
Kanoulas, Evangelos ;
Sanderson, Mark .
SIGIR'16: PROCEEDINGS OF THE 39TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2016, :685-688
[4]  
Chapelle CA, 2009, CAM APPL L, P1
[5]  
Dupret Georges, 2008, USER BROWSING MODEL, P331
[6]  
LIU MY, 2018, SIGIR, P1121, DOI DOI 10.1145/3209978.3210097
[7]   Time-Aware Click Model [J].
Liu, Yiqun ;
Xie, Xiaohui ;
Wang, Chao ;
Nie, Jian-Yun ;
Zhang, Min ;
Ma, Shaoping .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2017, 35 (03)
[8]   Win-Win Search: Dual-Agent Stochastic Game in Session Search [J].
Luo, Jiyun ;
Zhang, Sicong ;
Yang, Hui .
SIGIR'14: PROCEEDINGS OF THE 37TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2014, :587-596
[9]  
Pennington J., 2014, EMPIRICAL METHODS NA, P1532, DOI [DOI 10.3115/V1/D14-1162, 10.3115/v1/D14-1162]
[10]   Incorporating Non-sequential Behavior into Click Models [J].
Wang, Chao ;
Liu, Yiqun ;
Wang, Meng ;
Zhou, Ke ;
Nie, Jian-yun ;
Ma, Shaoping .
SIGIR 2015: PROCEEDINGS OF THE 38TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2015, :283-292