A Comparative Study of Utilizing Topic Models for Information Retrieval

被引:0
作者
Yi, Xing [1 ]
Allan, James [1 ]
机构
[1] Univ Massachusetts, Dept Comp Sci, Ctr Intelligent Informat Retrieval, Amherst, MA 01003 USA
来源
ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS | 2009年 / 5478卷
关键词
Topic Model; Retrieval; Evaluation;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We explore the utility of different types of topic models for retrieval purposes. Based on prior work, we describe several ways that topic models can be integrated into the retrieval process. We evaluate the effectiveness of different types of topic models within those retrieval approaches. We show that: (1) topic models are effective for document smoothing; (2) more rigorous topic models such as Latent Dirichlet Allocation provide gains over cluster-based models; (3) more elaborate topic models that capture topic dependencies provide no additional gains; (4) smoothing documents by using their similar documents is as effective as smoothing them by using topic models; (5) doing query expansion should utilize topics discovered in the feedback documents instead of coarse-grained topics from the whole corpus; (6) generally, incorporating topics in the feedback documents for building relevance models can benefit the performance more for queries that have more relevant documents.
引用
收藏
页码:29 / 41
页数:13
相关论文
共 16 条
  • [11] Lavrenko Victor, 2001, P 24 ANN INT ACM SIG, P120, DOI DOI 10.1145/383952.383972
  • [12] LIN X, 2004, P ACM SIGIR SHEFF UK, P186
  • [13] Text classification from labeled and unlabeled documents using EM
    Nigam, K
    McCallum, AK
    Thrun, S
    Mitchell, T
    [J]. MACHINE LEARNING, 2000, 39 (2-3) : 103 - 134
  • [14] Tao T., 2006, P HLT NAACL, P407
  • [15] Xu JX, 1999, SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P254
  • [16] Zhai C., 2001, Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR'01, P334, DOI DOI 10.1145/383952.384019