Automatically Mining Facets for Queries from Their Search Results

被引:13
作者
Dou, Zhicheng [1 ,2 ]
Jiang, Zhengbao [1 ,2 ]
Hu, Sha [1 ,2 ]
Wen, Ji-Rong [1 ,2 ]
Song, Ruihua [3 ]
机构
[1] Renmin Univ China, Beijing Key Lab Big Data Management & Anal Method, Sch Informat, Beijing 100872, Peoples R China
[2] Renmin Univ China, DEKE, Beijing 100872, Peoples R China
[3] Microsoft Res, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Query facet; faceted search; summarization; user intent;
D O I
10.1109/TKDE.2015.2475735
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We address the problem of finding query facets which are multiple groups of words or phrases that explain and summarize the content covered by a query. We assume that the important aspects of a query are usually presented and repeated in the query's top retrieved documents in the style of lists, and query facets can be mined out by aggregating these significant lists. We propose a systematic solution, which we refer to as QDMiner, to automatically mine query facets by extracting and grouping frequent lists from free text, HTML tags, and repeat regions within top search results. Experimental results show that a large number of lists do exist and useful query facets can be mined by QDMiner. We further analyze the problem of list duplication, and find better query facets can be mined by modeling fine-grained similarities between lists and penalizing the duplicated lists.
引用
收藏
页码:385 / 397
页数:13
相关论文
共 39 条
[1]  
[Anonymous], 2003, Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval (SIGIR '03), DOI DOI 10.1145/860435.860453
[2]  
[Anonymous], P NTCIR
[3]  
[Anonymous], 2003, MSRTR200379
[4]  
[Anonymous], 2010, P 19 INT C WORLD WID
[5]  
[Anonymous], 2008, Introduction to information retrieval
[6]  
[Anonymous], 2011, SIGMOD
[7]  
[Anonymous], 2004, WWW '04, DOI DOI 10.1145/988672.988687
[8]  
[Anonymous], 2007, Procseedings of the Human Language Technology Conference
[9]  
[Anonymous], 2009, P 18 ACM C INF KNOWL
[10]  
BaezaYates R, 2004, LECT NOTES COMPUT SC, V3268, P588