Search Result Diversification in Short Text Streams

被引:13
作者
Liang, Shangsong [1 ]
Yilmaz, Emine [1 ,2 ]
Shen, Hong [3 ,4 ]
De Rijke, Maarten [5 ]
Croft, W. Bruce [6 ]
机构
[1] UCL, Dept Comp Sci, London, England
[2] Alan Turing Inst, London, England
[3] Sun Yat Sen Univ, Sch Data & Comp Sci, Guangzhou, Guangdong, Peoples R China
[4] Univ Adelaide, Dept Comp Sci, Adelaide, SA, Australia
[5] Univ Amsterdam, Informat Inst, Amsterdam, Netherlands
[6] Univ Massachusetts, Coll Informat & Comp Sci, Amherst, MA 01003 USA
关键词
Diversity; ad hoc retrieval; data streams;
D O I
10.1145/3057282
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We consider the problem of search result diversification for streams of short texts. Diversifying search results in short text streams is more challenging than in the case of long documents, as it is difficult to capture the latent topics of short documents. To capture the changes of topics and the probabilities of documents for a given query at a specific time in a short text stream, we propose a dynamic Dirichlet multinomial mixture topic model, called D2M3, as well as a Gibbs sampling algorithm for the inference. We also propose a streaming diversification algorithm, SDA, that integrates the information captured by D2M3 with our proposed modified version of the PM-2 (Proportionality-based diversification Method second version) diversification algorithm. We conduct experiments on a Twitter dataset and find that SDA statistically significantly outperforms state-of-the-art non-streaming retrieval methods, plain streaming retrieval methods, as well as streaming diversification methods that use other dynamic topic models.
引用
收藏
页数:35
相关论文
共 51 条
[1]  
Abbar S., 2013, P 22 INT C WORLD WID, P1, DOI DOI 10.1145/2488388.2488390
[2]  
Agrawal R., 2009, P 2 ACM INT C WEB SE, DOI DOI 10.1145/1498759.1498766
[3]  
[Anonymous], 2006, P 29 ANN INT ACM SIG, DOI [DOI 10.1145/1148170.1148245, 10.1145/1148170.1148245]
[4]  
[Anonymous], 2006, ICML, DOI DOI 10.1145/1143844.1143917
[5]  
[Anonymous], 2008, Introduction to information retrieval
[6]  
Blei D., 2006, Advances in Neural Information Processing Systems, V18, P147
[7]  
Blei D.M., 2006, INT C MACHINE LEARNI, DOI DOI 10.1145/1143844.1143859
[8]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[9]  
Carbonell J., 1998, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P335, DOI 10.1145/290941.291025
[10]   Diversity-Aware Top-k Publish/Subscribe for Text Stream [J].
Chen, Lisi ;
Cong, Gao .
SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, :347-362