Finding Semantically Valid and Relevant Topics by Association-Based Topic Selection Model

被引:14
作者
Gao, Yang [1 ,2 ,3 ]
Li, Yuefeng [4 ]
Lau, Raymond Y. K. [5 ]
Xu, Yue [4 ]
Bashar, Md Abul [4 ]
机构
[1] Beijing Engn Res Ctr Mass Language Informat Proc, Beijing, Peoples R China
[2] Beijing Inst Technol, Beijing, Peoples R China
[3] Beijing Adv Innovat Ctr Imaging Technol, Beijing, Peoples R China
[4] Queensland Univ Technol, Sch Elect Engn & Comp Sci, Brisbane, Qld, Australia
[5] City Univ Hong Kong, Hong Kong, Hong Kong, Peoples R China
基金
澳大利亚研究理事会; 中国国家自然科学基金;
关键词
Topic selection; topic evaluation; topic components; information filtering;
D O I
10.1145/3094786
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Topic modelling methods such as Latent Dirichlet Allocation (LDA) have been successfully applied to various fields, since these methods can effectively characterize document collections by using a mixture of semantically rich topics. So far, many models have been proposed. However, the existing models typically outperform on full analysis on the whole collection to find all topics but difficult to capture coherent and specifically meaningful topic representations. Furthermore, it is very challenging to incorporate user preferences into existing topic modelling methods to extract relevant topics. To address these problems, we develop a novel personalized Association-based Topic Selection (ATS) model, which can identify semantically valid and relevant topics from a set of raw topics based on the semantical relatedness between users' preferences and the structured patterns captured in topics. The advantage of the proposed ATS model is that it enables an interactive topic modelling process driven by users' specific interests. Based on three benchmark datasets, namely, RCV1, R8, and WT10G under the context of information filtering (IF) and information retrieval (IR), our rigorous experiments show that the proposed ATS model can effectively identify relevant topics with respect to users' specific interests, and hence to improve the performance of IF and IR.
引用
收藏
页数:22
相关论文
共 36 条
[1]  
Andrzejewski David, 2009, Proc Int Conf Mach Learn, V382, P25
[2]  
[Anonymous], 2008, Introduction to information retrieval
[3]  
[Anonymous], IEEE T KNOWLEDGE DAT
[4]  
[Anonymous], 2007, Handbook of latent semantic analysis
[5]  
BAI J., 2005, Proceedings of ACM CIKM 05, P688, DOI DOI 10.1145/1099554.1099725
[6]   Topic-aware social influence propagation models [J].
Barbieri, Nicola ;
Bonchi, Francesco ;
Manco, Giuseppe .
KNOWLEDGE AND INFORMATION SYSTEMS, 2013, 37 (03) :555-584
[7]   Probabilistic Topic Models [J].
Blei, David M. .
COMMUNICATIONS OF THE ACM, 2012, 55 (04) :77-84
[8]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[9]   External Evaluation of Topic Models: A Graph Mining Approach [J].
Chan, Hau ;
Akoglu, Leman .
2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2013, :973-978
[10]  
Chang J., 2009, Adv. Neural Inf. Process. Syst., V22, DOI DOI 10.5555/2984093.2984126