Pre-training Multi-party Dialogue Models with Latent Discourse Inference

被引：0

作者：

Li, Yiyang ^{[1
,2
]}

Huang, Xinting ^{[3
]}

Bi, Wei ^{[3
]}

Zhao, Hai ^{[1
,2
]}

机构：

[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai, Peoples R China

[2] Shanghai Jiao Tong Univ, Key Lab Shanghai Educ Commiss Intelligent Interac, Shanghai, Peoples R China

[3] Tencent AI Lab, NLP Ctr, Shenzhen, Peoples R China

来源：

PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1 | 2023年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-party dialogues are more difficult for models to understand than one-to-one two-party dialogues, since they involve multiple interlocutors, resulting in interweaving reply-to relations and information flows. To step over these obstacles, an effective way is to pre-train a model that understands the discourse structure of multi-party dialogues, namely, to whom each utterance is replying. However, due to the lack of explicitly annotated discourse labels in multi-party dialogue corpora, previous works fail to scale up the pre-training process by putting aside the unlabeled multi-party conversational data for nothing. To fully utilize the unlabeled data, we propose to treat the discourse structures as latent variables, then jointly infer them and pre-train the discourse-aware model by unsupervised latent variable inference methods. Experiments on multiple downstream tasks show that our pre-trained model outperforms strong baselines by large margins and achieves state-of-the-art (SOTA) results, justifying the effectiveness of our method. The official implementation of this paper is available at https://github.com/EricLee8/MPD_EMVI.

引用

页码：9584 / 9599

页数：16

共 35 条

[1] Bao SQ, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P85
[2] Chen JJ, 2022, AAAI CONF ARTIF INTE, P10482
[3] Chen W, 2022, PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), P4852
[4] ELECTRA: PRE-TRAINING TEXT ENCODERS AS DISCRIMINATORS RATHER THAN GENERATORS
Clark, Kevin
Luong, Minh-Thang
Le, Quoc V.
Manning, Christopher D.
[J]. INFORMATION SYSTEMS RESEARCH, 2020,
[5] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[6] Research on Named Entity Recognition Based on Multi-Task Learning and Biaffine Mechanism
Gao, Wenchao
Li, Yu
Guan, Xiaole
Chen, Shiyu
Zhao, Shanshan
[J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
[7] Gu J.C., 2022, P 31 INT JOINT C ART, P5486
[8] Gu JC, 2022, PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), P5086
[9] Gu JC, 2021, 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, P3682
[10] Speaker-Aware BERT for Multi-Turn Response Selection in Retrieval-Based Chatbots
Gu, Jia-Chen
Li, Tianda
Liu, Quan
Ling, Zhen-Hua
Su, Zhiming
Wei, Si
Zhu, Xiaodan
[J]. CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 2041 - 2044

← 1 2 3 4 →