Indexing by Latent Dirichlet Allocation and an Ensemble Model

被引:13
作者
Wang, Yanshan [1 ]
Lee, Jae-Sung [2 ]
Choi, In-Chan [1 ]
机构
[1] Korea Univ, Sch Ind Management Engn, Seoul, South Korea
[2] Diquest, 501,Kolon Villant 2, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
information retrieval; machine learning; searching;
D O I
10.1002/asi.23444
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The contribution of this article is twofold. First, we present Indexing by latent Dirichlet allocation (LDI), an automatic document indexing method. Many ad hoc applications, or their variants with smoothing techniques suggested in LDA-based language modeling, can result in unsatisfactory performance as the document representations do not accurately reflect concept space. To improve document retrieval performance, we introduce a new definition of document probability vectors in the context of LDA and present a novel scheme for automatic document indexing based on LDA. Second, we propose an Ensemble Model (EnM) for document retrieval. EnM combines basic indexing models by assigning different weights and attempts to uncover the optimal weights to maximize the mean average precision. To solve the optimization problem, we propose an algorithm, which is derived based on the boosting method. The results of our computational experiments on benchmark data sets indicate that both the proposed approaches are viable options for document retrieval.
引用
收藏
页码:1736 / 1750
页数:15
相关论文
共 26 条
[1]  
[Anonymous], 2010, Search engines: Information retrieval in practice
[2]  
[Anonymous], 2001, IEEE Data Eng. Bull.
[3]  
Azzopardi L, 2004, IEEE IJCNN, P3281
[4]   Probabilistic Topic Models [J].
Blei, David M. .
COMMUNICATIONS OF THE ACM, 2012, 55 (04) :77-84
[5]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[6]  
Buttcher S., 2006, Proceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P621, DOI 10.1145/1148170.1148285
[7]   LEARNED VECTOR-SPACE MODELS FOR DOCUMENT-RETRIEVAL [J].
CAID, WR ;
DUMAIS, ST ;
GALLANT, SI .
INFORMATION PROCESSING & MANAGEMENT, 1995, 31 (03) :419-429
[8]  
Choi I. C., 2010, P INT C DAT MIN, P409
[9]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[10]  
2-9