Modularized Pre-Training for End-to-End Task-Oriented Dialogue

被引：4

作者：

Qin, Libo ^{[1
]}

Xu, Xiao ^{[2
]}

Wang, Lehan ^{[2
]}

Zhang, Yue ^{[3
]}

Che, Wanxiang ^{[2
]}

机构：

[1] Cent South Univ, Sch Comp Sci & Engn, Changsha 410083, Peoples R China

[2] Harbin Inst Technol, Res Ctr Social Comp & Informat Retrieval, Harbin 150001, Heilongjiang, Peoples R China

[3] Westlake Univ, Sch Engn, Hangzhou 310024, Peoples R China

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2023年 / 31卷

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Task analysis; Knowledge based systems; Automobiles; Training; History; Vehicles; Speech processing; Task-oriented dialogue system; modularized pre-training; consistency-guided data augmentation;

D O I：

10.1109/TASLP.2023.3244503

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Pre-training for end-to-end task-oriented dialogue systems (EToDs) is a challenging task due to its unique knowledge base query (accuracy) need and lack of sufficient training data (fluency). In this paper, we try to mitigate the above challenges by introducing a modularized pre-training framework for EToDs, which achieves to effectively improve both accuracy and fluency of EToDs through a pre-training paradigm. The core insight is a modular design by decomposing EToDs into a generation (fluency) module and a knowledge-retriever (accuracy) module, which allows us to optimize each module by pre-training these two sub-modules with different well-designed pre-training tasks, respectively. In addition, such a modularized paradigm enables us to make full use of large amounts of KB-free dialogue corpus for the pre-training generation module, which can alleviate the insufficient training problem. Furthermore, we introduce a new consistency-guided data augmentation (CGDA) strategy to cope with the data scarcity problem to better pre-train the knowledge-retriever module. Finally, we fine-tune the pre-trained generation module and knowledge-retriever module jointly. Experimental results on three datasets show that our model achieve superior performance in terms of both fluency and accuracy. To our knowledge, this is the first work to explore modularized pre-training methods for EToDs.

引用

页码：1601 / 1610

页数：10

共 40 条

[1] Balaraman V, 2021, SIGDIAL 2021: 22ND ANNUAL MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE (SIGDIAL 2021), P239
[2] Budzianowski P, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P5016
[3] Byrne B, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P4516
[4] AgentGraph: Toward Universal Dialogue Management With Structured Deep Reinforcement Learning
Chen, Lu
Chen, Zhi
Tan, Bowen
Long, Sishan
Gasic, Milica
Yu, Kai
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (09) : 1378 - 1391
[5] Chen W, 2020, PROC INT C LEARN REP
[6] Eric M., 2017, P 15 C EUR CHAPT ASS, V2, P468
[7] Eric M, 2017, 18TH ANNUAL MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE (SIGDIAL 2017), P37
[8] Ham D, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P583
[9] He WW, 2022, AAAI CONF ARTIF INTE, P10749
[10] He WW, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), P3498

← 1 2 3 4 →