Graph representation based reward shaping approach for addressing reward sparsity in task-oriented dialogue systems

被引:0
作者
Saffari, Shaghayegh [1 ]
Dorrigiv, Morteza [1 ]
Yaghmaee, Farzin [1 ]
机构
[1] Semnan Univ, Dept Elect & Comp Engn, Semnan 1911135131, Iran
关键词
Task-oriented dialogue systems; Reward sparsity; Reward shaping; Reinforcement learning; Graph representation learning; MinCutPool;
D O I
10.1016/j.neucom.2025.131058
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Task-Oriented Dialogue (TOD) systems rely on sequential decision-making to assist users in accomplishing specific goals. Due to this sequential nature, Reinforcement Learning (RL) has been widely adopted to train dialogue systems. However, reward sparsity remains a fundamental challenge in RL-based TOD systems, leading to inefficient exploration and suboptimal policy learning. While prior studies have focused on adopting advanced RL techniques to train TOD agents, the design of an effective reward function has received less attention. To address this challenge, this paper proposes a novel Potential-Based Reward Shaping (PBRS) approach that integrates graph representation learning with spectral graph clustering. This approach leverages graph representation learning algorithms to capture both structural and semantic relationships between dialogue states, facilitating more effective reward propagation. Moreover, the integration of Minimum Cut Pooling (MinCutPool) enables adaptive soft clustering, which promotes generalization across large and dynamic state spaces and remains robust to unseen or isolated states caused by frequent task switching or goal resetting. Extensive experiments were conducted to assess the effectiveness of the proposed approach on three benchmark tasks collected via Amazon Mechanical Turk. Evaluation metrics in this paper include success rate, average reward, dialogue turns, inference latency, number of parameters, CPU training time, and maximum CPU memory. Specifically, on the movie ticket booking task, the proposed GAT+MinCut+CSR model achieved 93.00 %, 28.14, 7.14, 96 ms, 80,143, 19982 s, and 7 GB, respectively, demonstrating superior performance and computational efficiency compared to state-of-the-art baselines.
引用
收藏
页数:23
相关论文
共 55 条
[1]   A review of dialogue systems: current trends and future directions [J].
Algherairy, Atheer ;
Ahmed, Moataz .
NEURAL COMPUTING & APPLICATIONS, 2023, 36 (12) :6325-6351
[2]  
Bianchi F.M., 2020, MinCut pooling in graph neural networks
[3]  
Bianchi FM, 2020, PR MACH LEARN RES, V119
[4]  
Bordes Antoine., 2016, arXiv, DOI [10.48550/arXiv.1605.07683, DOI 10.48550/ARXIV.1605.07683]
[5]  
Cheng JP, 2016, Arxiv, DOI arXiv:1601.06733
[6]  
Chu Z., 2024, Adv. Neural Inf. Process. Syst., V36
[7]  
Durugkar I, 2021, ADV NEUR IN, V34
[8]  
El Asri Layla, 2013, Statistical Language and Speech Processing. First International Conference, SLSP 2013. Proceedings: LNCS 7978, P93, DOI 10.1007/978-3-642-39593-2_8
[9]  
Feng Y., 2023, arXiv, DOI [10.48550/arXiv.2302.1034, DOI 10.48550/ARXIV.2302.1034]
[10]  
Ferreira E., 2013, P 2 WORKSH MACH LEAR, P61, DOI [10.1145/2493525.249353, DOI 10.1145/2493525.249353]