Incorporating Word Embedding into Cross-lingual Topic Modeling

被引:4
作者
Chang, Chia-Hsuan [1 ]
Hwang, San-Yih [1 ]
Xui, Tou-Hsiang [1 ]
机构
[1] Natl Sun Yat Sen Univ, Dept Informat Management, Kaohsiung, Taiwan
来源
2018 IEEE INTERNATIONAL CONGRESS ON BIG DATA (IEEE BIGDATA CONGRESS) | 2018年
关键词
cross-lingual topic model; text mining; Latent Dirichlet Allocation; word space;
D O I
10.1109/BigDataCongress.2018.00010
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we address the cross-lingual topic modeling, which is an important technique that enables global enterprises to detect and compare topic trends across global markets. Previous works in cross-lingual topic modeling have proposed methods that utilize parallel or comparable corpus in constructing the polylingual topic model. However, parallel or comparable corpus in many cases are not available. In this research, we incorporate techniques of mapping cross-lingual word space and the topic modeling (LDA) and propose two methods: Translated Corpus with LDA (TC-LDA) and Post Match LDA (PM-LDA). The cross-lingual word space mapping allows us to compare words of different languages, and LDA enables us to group words into topics. Both TC-LDA and PM-LDA do not need parallel or comparable corpus and hence have more applicable domains. The effectiveness of both methods is evaluated using UM-Corpus and WS-353. Our evaluation results indicate that both methods are able to identify similar documents written in different language. In addition, PM-LDA is shown to achieve better performance than TC-LDA, especially when document length is short.
引用
收藏
页码:17 / 24
页数:8
相关论文
共 39 条
[1]  
[Anonymous], 2003, P 26 ANN INT ACM SIG
[2]  
[Anonymous], 2011, AAAI
[3]  
Artetxe Mikel, 2016, P 2016 C EMPIRICAL M, P2289, DOI [DOI 10.18653/V1/D16-1250, 10.18653/v1/D16-1250]
[4]  
Bengio Y, 2001, ADV NEUR IN, V13, P932
[5]   Using latent semantic indexing for multilanguage information retrieval [J].
Berry, MW ;
Young, PG .
COMPUTERS AND THE HUMANITIES, 1995, 29 (06) :413-429
[6]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[7]  
Collobert R., 2008, P 25 ICML, P160, DOI [DOI 10.1145/1390156.1390177, 10.1145/1390156.1390177]
[8]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[9]  
2-9
[10]  
Dinu G., 2014, ARXIV14126568