Towards the Improvement of a Topic Model with Semantic Knowledge

被引:2
作者
Ferrugento, Adriana [1 ]
Alves, Ana [1 ,2 ]
Oliveira, Hugo Goncalo [1 ]
Rodrigues, Filipe [1 ]
机构
[1] Univ Coimbra, Dept Informat Engn, CISUC, Coimbra, Portugal
[2] Polytech Inst Coimbra, Coimbra Inst Engn, Coimbra, Portugal
来源
PROGRESS IN ARTIFICIAL INTELLIGENCE-BK | 2015年 / 9273卷
关键词
Topic model; Semantics; WordNet; SemLDA;
D O I
10.1007/978-3-319-23485-4_76
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although typically used in classic topic models, surface words cannot represent meaning on their own. Consequently, redundancy is common in those topics, which may, for instance, include synonyms. To face this problem, we present SemLDA, an extended topic model that incorporates semantics from an external lexical-semantic knowledge base. SemLDA is introduced and explained in detail, pointing out where semantics is included both in the pre-pocessing and generative phase of topic distributions. As a result, instead of topics as distributions over words, we obtain distributions over concepts, each represented by a set of synonymous words. In order to evaluate SemLDA, we applied preliminary qualitative tests automatically against a state-of-the-art classical topic model. The results were promising and confirm our intuition towards the benefits of incorporating general semantics in a topic model.
引用
收藏
页码:759 / 770
页数:12
相关论文
共 21 条
[1]  
[Anonymous], 2015, OSLO STUDIES LANGUAG
[2]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[3]  
Boyd-Graber J., 2007, Proc. of EMNLP-CoNLL, P1024
[4]  
Brody S., 2009, Proc. of EACL, P103, DOI DOI 10.3115/1609067.1609078
[5]  
Chemudugunta C, 2008, LECT NOTES COMPUT SC, V5318, P229, DOI 10.1007/978-3-540-88564-1_15
[6]  
Wang C, 2009, PROC CVPR IEEE, P1903, DOI [10.1109/CVPRW.2009.5206800, 10.1109/CVPR.2009.5206800]
[7]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[8]  
2-9
[9]   A latent variable model for chemogenomic profiling [J].
Flaherty, P ;
Giaever, G ;
Kumm, J ;
Jordan, MI ;
Arkin, AP .
BIOINFORMATICS, 2005, 21 (15) :3286-3293
[10]  
Guo W, 2011, P 2011 C EMPIRICAL M, P552