The Opposite of Smoothing: A Language Model Approach to Ranking Query-Specific Document Clusters

被引:13
作者
Kurland, Oren [1 ]
Krikon, Eyal [1 ]
机构
[1] Technion Israel Inst Technol, Fac Ind Engn & Management, IL-32000 Haifa, Israel
基金
美国国家科学基金会; 以色列科学基金会;
关键词
INFORMATION;
D O I
10.1613/jair.3327
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Exploiting information induced from (query-specific) clustering of top-retrieved documents has long been proposed as a means for improving precision at the very top ranks of the returned results. We present a novel language model approach to ranking query-specific clusters by the presumed percentage of relevant documents that they contain. While most previous cluster ranking approaches focus on the cluster as a whole, our model utilizes also information induced from documents associated with the cluster. Our model substantially outperforms previous approaches for identifying clusters containing a high relevant document percentage. Furthermore, using the model to produce document ranking yields precision-at-top-ranks performance that is consistently better than that of the initial ranking upon which clustering is performed. The performance also favorably compares with that of a state-of-the-art pseudo-feedback-based retrieval method.
引用
收藏
页码:367 / 395
页数:29
相关论文
共 62 条
  • [41] LEUSKI A, 1998, P 2 EUR C RES ADV TE, P535
  • [42] Liu X., 2006, Proceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P671, DOI 10.1145/1148170.1148310
  • [43] Liu XY, 2008, LECT NOTES COMPUT SC, V4956, P454
  • [44] Mei Q., 2007, Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P490
  • [45] Mihalcea R., 2004, P 2004 C EMP METH NA, P404
  • [46] Otterbacher J., 2005, P HUM LANG TECHN C C, P915, DOI DOI 10.3115/1220575.1220690
  • [47] Palmer C. R., 2001, P 1 ACM IEEE CS JOIN, P451
  • [48] PREECE SE, 1973, P AM SOC INFORM SCI, V10, P189
  • [49] SEO J, 2010, P SIGIR, P251
  • [50] Shanahan J.G., 2003, P TREC 12, P152