TianGong-ST: A New Dataset with Large-scale Refined Real-world Web Search Sessions

被引:32
|
作者
Chen, Jia [1 ]
Mao, Jiaxin [1 ]
Liu, Yiqun [1 ]
Zhang, Min [1 ]
Ma, Shaoping [1 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Inst Artificial Intelligence, Beijing Natl Res Ctr Informat Sci & Technol, Beijing 100084, Peoples R China
来源
PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19) | 2019年
关键词
Test collection; Session search; Information Retrieval;
D O I
10.1145/3357384.3358158
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Web search session data is precious for a wide range of Information Retrieval (IR) tasks, such as session search, query suggestion, click through rate (CTR) prediction and so on. Numerous studies have shown the great potential of considering context information for search system optimization. The well-known TREC Session Tracks have enhanced the development in this domain to a great extent. However, they are mainly collected via user studies or crowd-sourcing experiments and normally contain only tens to thousands sessions, which are deficient for the investigation with more sophisticated models. To tackle this obstacle, we present a new dataset that contains 147,155 refined web search sessions with both click-based and human-annotated relevance labels. The sessions are sampled from a huge search log thus can reflect real search scenarios. The proposed dataset can support a wide range of session-level or task-based IR studies. As an example, we test several interactive search models with both the PSCM and human relevance labels provided by this dataset and report the performance as a reference for future studies of session search.
引用
收藏
页码:2485 / 2488
页数:4
相关论文
共 1 条
  • [1] QBSUM: A large-scale query-based document summarization dataset from real-world applications
    Zhao, Mingjun
    Yan, Shengli
    Liu, Bang
    Zhong, Xinwang
    Hao, Qian
    Chen, Haolan
    Niu, Di
    Long, Bowei
    Guo, Weidong
    COMPUTER SPEECH AND LANGUAGE, 2021, 66