A Comparative Study of Utilizing Topic Models for Information Retrieval

被引:0
作者
Yi, Xing [1 ]
Allan, James [1 ]
机构
[1] Univ Massachusetts, Dept Comp Sci, Ctr Intelligent Informat Retrieval, Amherst, MA 01003 USA
来源
ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS | 2009年 / 5478卷
关键词
Topic Model; Retrieval; Evaluation;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We explore the utility of different types of topic models for retrieval purposes. Based on prior work, we describe several ways that topic models can be integrated into the retrieval process. We evaluate the effectiveness of different types of topic models within those retrieval approaches. We show that: (1) topic models are effective for document smoothing; (2) more rigorous topic models such as Latent Dirichlet Allocation provide gains over cluster-based models; (3) more elaborate topic models that capture topic dependencies provide no additional gains; (4) smoothing documents by using their similar documents is as effective as smoothing them by using topic models; (5) doing query expansion should utilize topics discovered in the feedback documents instead of coarse-grained topics from the whole corpus; (6) generally, incorporating topics in the feedback documents for building relevance models can benefit the performance more for queries that have more relevant documents.
引用
收藏
页码:29 / 41
页数:13
相关论文
共 16 条
  • [1] [Anonymous], 2006, P 29 ANN INT ACM SIG, DOI DOI 10.1145/1148170.1148204
  • [2] [Anonymous], 2006, ICML, DOI [10.1145/1143844.1143917, DOI 10.1145/1143844.1143917]
  • [3] [Anonymous], 2007, HDB LATENT SEMANTIC
  • [4] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [5] Chengxiang Zhai, 2001, Proceedings of the 2001 ACM CIKM. Tenth International Conference on Information and Knowledge Management, P403, DOI 10.1145/502585.502654
  • [6] DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
  • [7] 2-9
  • [8] Probabilistic latent semantic indexing
    Hofmann, T
    [J]. SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, : 50 - 57
  • [9] Lafferty John, 2001, P SIGIR, P111, DOI DOI 10.1145/383952.383970
  • [10] LAVRENKO V, 2004, THESIS, P55