Lifelong Learning of Topics and Domain-Specific Word Embeddings

被引:0
作者
Qin, Xiaorui [1 ]
Lu, Yuyin [1 ]
Chen, Yufu [1 ]
Rao, Yanghui [1 ]
机构
[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou, Peoples R China
来源
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021 | 2021年
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Lifelong topic models mainly focus on indomain text streams in which each chunk only contains documents from a single domain. To overcome data diversity of the in-domain corpus, most of the existing methods exploit the information from limited sources in a separate and heuristic manner. In this study, we develop a lifelong collaborative model (LCM) based on non-negative matrix factorization to accurately learn topics and domain-specific word embeddings. LCM particularly investigates: (1) developing a knowledge graph based on the semantic relationships among words in the lifelong learning process, so as to accumulate global context information discovered by topic models and local context information reflected by context word embeddings from previous domains, and (2) developing a subword graph based on byte pair encoding and pairwise word relationships to exploit subword information of words in the current in-domain corpus. To the best of our knowledge, we are the first to collaboratively learn topics and word embeddings via lifelong learning. Experiments on real-world in-domain text streams validate the effectiveness of our method.
引用
收藏
页码:2294 / 2309
页数:16
相关论文
共 52 条
  • [1] Aletras Nikolaos., 2013, P 10 INT C COMP SEM, P13
  • [2] Bayot R, 2016, I C SOFTWARE KNOWL I, P382, DOI 10.1109/SKIMA.2016.7916251
  • [3] Bianchi F, 2021, 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), P1676
  • [4] Bianchi Federico, 2020, ABS200403974 CORR
  • [5] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [6] Bojanowski P., 2017, T ASSOC COMPUT LING, V5, P135, DOI [10.1162/tacl_a_00051, DOI 10.1162/TACLA00051]
  • [7] Burkhardt S, 2019, J MACH LEARN RES, V20
  • [8] Cao Y, 2015, 2015 33RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), P133, DOI 10.1109/ICCD.2015.7357094
  • [9] Card D, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P2031
  • [10] Chang J., 2009, Advances in neural information processing systems, V22, P1