Quarry Meaning: A Topic Model Application focused on Spanish Documents

被引:0
作者
Acosta, Olga [1 ]
Aguilar, Cesar [1 ]
Araya, Fabiola [1 ]
机构
[1] Pontificia Univ Catolica Chile, Fac Letras, Campus San Joaquin, Santiago, Chile
来源
PROCESAMIENTO DEL LENGUAJE NATURAL | 2018年 / 61期
关键词
Natural language processing; text mining; topic modeling; contrastive approach; text classification;
D O I
10.26342/2018-61-31
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
This demo shows a standalone application that allows to easily train and test a topic model. The application includes filters for reducing noise in the results. On the one hand, a base stop-list is included, but it can be complemented with a non-relevant word list proposed by user, or obtained it by means of a contrastive approach using a reference corpus. On the other hand, words having a high semantic value can be considered using POS tags. We also include a visualization in word-clouds way, where ten topics can be shown, in order to analyze in detail the results. Finally, evaluation was carried out focusing topic model for classifying documents. Our model achieved levels of precision above 95% in the test set.
引用
收藏
页码:197 / 200
页数:4
相关论文
共 7 条
[1]  
Acosta O, 2015, LINGUAMATICA, V7, P19
[2]  
Arun R, 2010, LECT NOTES ARTIF INT, V6118, P391
[3]   Probabilistic Topic Models [J].
Blei, David M. .
COMMUNICATIONS OF THE ACM, 2012, 55 (04) :77-84
[4]   STRUCTURE CAUDATE NUCLEUS OF CAT - LIGHT AND ELECTRON MICROSCOPY [J].
KEMP, JM ;
POWELL, TPS .
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 1971, 262 (845) :383-+
[5]  
McCallum, 2002, MALLET MACHINE LEARN
[6]  
Schmid Helmut, 1994, P INT C NEW METH LAN, P44
[7]  
Steyvers M., 2007, HDB LATENT SEMANTIC, V427, P424