Pre-training Multi-party Dialogue Models with Latent Discourse Inference

被引:0
作者
Li, Yiyang [1 ,2 ]
Huang, Xinting [3 ]
Bi, Wei [3 ]
Zhao, Hai [1 ,2 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai, Peoples R China
[2] Shanghai Jiao Tong Univ, Key Lab Shanghai Educ Commiss Intelligent Interac, Shanghai, Peoples R China
[3] Tencent AI Lab, NLP Ctr, Shenzhen, Peoples R China
来源
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1 | 2023年
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-party dialogues are more difficult for models to understand than one-to-one two-party dialogues, since they involve multiple interlocutors, resulting in interweaving reply-to relations and information flows. To step over these obstacles, an effective way is to pre-train a model that understands the discourse structure of multi-party dialogues, namely, to whom each utterance is replying. However, due to the lack of explicitly annotated discourse labels in multi-party dialogue corpora, previous works fail to scale up the pre-training process by putting aside the unlabeled multi-party conversational data for nothing. To fully utilize the unlabeled data, we propose to treat the discourse structures as latent variables, then jointly infer them and pre-train the discourse-aware model by unsupervised latent variable inference methods. Experiments on multiple downstream tasks show that our pre-trained model outperforms strong baselines by large margins and achieves state-of-the-art (SOTA) results, justifying the effectiveness of our method. The official implementation of this paper is available at https://github.com/EricLee8/MPD_EMVI.
引用
收藏
页码:9584 / 9599
页数:16
相关论文
共 35 条
  • [11] He Y., 2021, PACLIC, P551
  • [12] Hu WP, 2019, PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P5010
  • [13] Jang E., 2017, INT C LEARN REPR
  • [14] Jia Q, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), P1911
  • [15] Kingma D. P., 2014, arXiv
  • [16] Li Jiaqi, 2020, P 28 INT C COMPUTATI, P2642, DOI DOI 10.18653/V1/2020.COLING-MAIN.238
  • [17] Time-varying additive model with autoregressive errors for locally stationary time series
    Li, Jiyanglin
    Li, Tao
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2023, 52 (11) : 3848 - 3878
  • [18] Li YY, 2021, FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, P2053
  • [19] Li Yiyang, 2023, PRETRAINING MULTIPAR
  • [20] RoBERTa: A Robustly Optimized BERT Pretraining Approach
    Liu, Yinhan
    Ott, Myle
    Goyal, Naman
    Du, Jingfei
    Joshi, Mandar
    Chen, Danqi
    Levy, Omer
    Lewis, Mike
    Zettlemoyer, Luke
    Stoyanov, Veselin
    [J]. INFORMATION SYSTEMS RESEARCH, 2019,