Deep learning multi-language topic alignment model across domains

被引:0
|
作者
Yu C. [1 ]
Yuan S. [2 ]
Hu S. [1 ]
An L. [3 ]
机构
[1] School of Information and Safety Engineering, Zhongnan University of Economics and Law, Wuhan
[2] School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan
[3] School of Information Management, Wuhan University, Wuhan
来源
An, Lu (anlu97@163.com) | 1600年 / Tsinghua University卷 / 60期
关键词
Bilingual word embedding; Cross-domain topic alignment; Cross-lingual topic alignment; Deep learning; Knowledge alignment;
D O I
10.16511/j.cnki.qhdxxb.2020.21.003
中图分类号
学科分类号
摘要
Deep representation learning of domain topics was used to build a topic alignment model (TAM) with integrated bilingual word embedding. The semantic alignment lexicon was extended to include bilingual word embedding. A traditional bilingual topic model was used to develop an auxiliary distribution to improve the word distribution semantic sharing to improve the topic alignments in the cross-lingual and cross-domain contexts. A bilingual topic similarity (BTS) indicator and a bilingual alignment similarity (BAS) indicator were developed to evaluate the supplementary alignment. The bilingual alignment similarity improved the cross-language topic matching by about 1.5% compared to a traditional multi-language common cultural theme analysis and improved F1 by about 10% for cross-domain topic alignment. These results can improve cross language and cross domain information processing. © 2020, Tsinghua University Press. All right reserved.
引用
收藏
页码:430 / 439
页数:9
相关论文
共 26 条
  • [1] Papadimitriou C.H., Raghavan P., Tamaki H., Et al., Latent semantic indexing: A probabilistic analysis, Journal of Computer and System Sciences, 61, 2, pp. 217-235, (2000)
  • [2] Xia Q., Yan X., Yu Z.T., Et al., Analysis of sino-Vietnamese bilingual news topics mixing elements and themes, Computer Engineering, 42, 9, pp. 186-191, (2016)
  • [3] Tang M.M., Zhu M.W., Yu Z.T., Et al., Chinese-Vietnamese bilingual event correlation analysis based on bilingual topic and factor graph, Journal of Chinese Information Processing, 31, 6, (2017)
  • [4] Si L., Chen Y.X., Zeng Y.L., A study on cross-language information retrieval model based on multilingual ontology, Library and Information Service, 61, 1, pp. 100-108, (2017)
  • [5] Yu C.M., Feng B.L., Tian X., Et al., Deep representative learning based sentiment analysis in the cross-lingual environment, Journal of Shandong University (Natural Science), 53, 3, pp. 13-23, (2018)
  • [6] Xu H.Y., Dong K., Liu C.J., Et al., A review on topic identification of scientific text files, Information Science, 35, 1, pp. 153-160, (2017)
  • [7] Yu C.M., An L., From small data to big data: Three challenges for opinion retrieval, Information Studies (Theory & Application), 39, 2, pp. 13-19, (2016)
  • [8] Wei X., Croft W.B., LDA-based document models for ad-hoc retrieval, Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 178-185, (2006)
  • [9] Li S.H., Chua T.S., Zhu J., Et al., Generative topic embedding: A continuous representation of documents, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 666-675, (2016)
  • [10] Liu Y., Liu Z.Y., Chua T.S., Et al., Topical word embeddings