Decomposed Deep Q-Network for Coherent Task-Oriented Dialogue Policy Learning

被引：2

作者：

Zhao, Yangyang ^{[1
]}

Yin, Kai ^{[2
]}

Wang, Zhenyu ^{[2
]}

Dastani, Mehdi ^{[3
]}

Wang, Shihan ^{[3
]}

机构：

[1] Changsha Univ Sci & Technol, Dept Comp & Commun Engn, Changsha 410000, Peoples R China

[2] South China Univ Technol, Dept Software, Guangzhou 510000, Peoples R China

[3] Univ Utrecht, Dept Informat & Comp Sci, NL-3508 Utrecht, Netherlands

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2024年 / 32卷

关键词：

Reinforcement learning; Periodic structures; dialogue policy; action space inflation; incoherence problem; REINFORCEMENT; DECISION; SYSTEMS;

D O I：

10.1109/TASLP.2024.3357038

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Reinforcement learning (RL) has emerged as a key technique for designing dialogue policies. However, action space inflation in dialogue tasks has led to a heavy decision burden and incoherence problems for dialogue policies. In this paper, we propose a novel decomposed deep Q-network (D2Q) that exploits the natural structure of dialogue actions to perform decomposition on Q-function, realizing efficient and coherent dialogue policy learning. Instead of directly evaluating the Q-function, it consists of two separate estimators, one for the abstract action-value functions and the other for the specific action-value functions, both sharing a common feature layer. The abstract action-value function determines the speech act of the system action, while the specific action-value function focuses on the concrete action. This structure establishes a logical relationship between the user and the system on speech actions, avoiding the problem of incoherence. Moreover, the abstract action-value function shields unreasonable specific actions in the inflated action space, reducing the decision complexity. Our results show that the problem of incoherence is prevalent in existing approaches, which significantly impacts the efficiency and quality of dialogue policy learning. Our D2Q architecture alleviates this problem and performs significantly better than competitive baselines in both evaluated and human experiments. Further experiments validate the generality of our method. It can be easily extended to other RL-based dialogue policy approaches.

引用

页码：1380 / 1391

页数：12

共 45 条

[1]

Ahilan S, 2019, Arxiv, DOI arXiv:1901.08492

[2]

[Anonymous], 1981, CONVERSATION DISCOUR

[3]

[Anonymous], 2018, NAACL HLT 2018 2018, DOI DOI 10.18653/V1/N18-2112

[4]

Bakker B., 2004, P 8 C INTELLIGENT AU, P438

[5] Recent Advances in Hierarchical Reinforcement Learning [J].

Andrew G. Barto ;

Sridhar Mahadevan .

Discrete Event Dynamic Systems, 2003, 13 (4) :341-379

[6] Hierarchical reinforcement learning and decision making [J].

Botvinick, Matthew Michael .

CURRENT OPINION IN NEUROBIOLOGY, 2012, 22 (06) :956-962

[7]

Budzianowski P, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P5016

[8]

Budzianowski P, 2017, 18TH ANNUAL MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE (SIGDIAL 2017), P86

[9] Knowledge-Based Conversational Recommender Systems Enhanced by Dialogue Policy Learning [J].

Chen, Keyu ;

Sun, Shiliang .

PROCEEDINGS OF THE 10TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE GRAPHS (IJCKG 2021), 2021, :10-18

[10] AgentGraph: Toward Universal Dialogue Management With Structured Deep Reinforcement Learning [J].

Chen, Lu ;

Chen, Zhi ;

Tan, Bowen ;

Long, Sishan ;

Gasic, Milica ;

Yu, Kai .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (09) :1378-1391

← 1 2 3 4 5 →