Reward estimation with scheduled knowledge distillation for dialogue policy learning

被引：2

作者：

Qiu, Junyan ^{[1
]}

Zhang, Haidong ^{[2
]}

Yang, Yiping ^{[2
]}

机构：

[1] Univ Chinese Acad Sci, Beijing, Peoples R China

[2] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China

来源：

CONNECTION SCIENCE | 2023年 / 35卷 / 01期

关键词：

Reinforcement learning; dialogue policy learning; curriculum learning; knowledge distillation;

D O I：

10.1080/09540091.2023.2174078

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Formulating dialogue policy as a reinforcement learning (RL) task enables a dialogue system to act optimally by interacting with humans. However, typical RL-based methods normally suffer from challenges such as sparse and delayed reward problems. Besides, with user goal unavailable in real scenarios, the reward estimator is unable to generate reward reflecting action validity and task completion. Those issues may slow down and degrade the policy learning significantly. In this paper, we present a novel scheduled knowledge distillation framework for dialogue policy learning, which trains a compact student reward estimator by distilling the prior knowledge of user goals from a large teacher model. To further improve the stability of dialogue policy learning, we propose to leverage self-paced learning to arrange meaningful training order for the student reward estimator. Comprehensive experiments on Microsoft Dialogue Challenge and MultiWOZ datasets indicate that our approach significantly accelerates the learning speed, and the task-completion success rate can be improved from 0.47%similar to 9.01% compared with several strong baselines.

引用

页数：28

共 50 条

[41] Knowledge distillation in federated learning: a comprehensive survey [J].

Salman, Hassan ;

Zaki, Chamseddine ;

Charara, Nour ;

Guehis, Sonia ;

Pradat-Peyre, Jean-Francois ;

Nasser, Abbass .

DISCOVER COMPUTING, 2025, 28 (01)

[42] Strabismus Detection Based on Uncertainty Estimation and Knowledge Distillation [J].

Rong, Yibiao ;

Yang, Ziyin ;

Zheng, Ce ;

Fan, Zhun .

Journal of Beijing Institute of Technology (English Edition), 2024, 33 (05) :399-411

[43] Boosting LightWeight Depth Estimation via Knowledge Distillation [J].

Hu, Junjie ;

Fan, Chenyou ;

Jiang, Hualie ;

Guo, Xiyue ;

Gao, Yuan ;

Lu, Xiangyong ;

Lam, Tin Lun .

KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT I, KSEM 2023, 2023, 14117 :27-39

[44] Learning Dialogue Policy Efficiently Through Dyna Proximal Policy Optimization [J].

Huang, Chenping ;

Cao, Bin .

COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING, COLLABORATECOM 2022, PT I, 2022, 460 :396-414

[45] Cold-started Curriculum Learning for Task-oriented Dialogue Policy [J].

Zhu, Hui ;

Zhao, Yangyang ;

Qin, Hua .

2021 IEEE INTERNATIONAL CONFERENCE ON E-BUSINESS ENGINEERING (ICEBE 2021), 2021, :100-105

[46] Scheduled Curiosity-Deep Dyna-Q: Efficient Exploration for Dialog Policy Learning [J].

Niu, Xuecheng ;

Ito, Akinori ;

Nose, Takashi .

IEEE ACCESS, 2024, 12 :46940-46952

[47] A Category-Aware Curriculum Learning for Data-Free Knowledge Distillation [J].

Li, Xiufang ;

Jiao, Licheng ;

Sun, Qigong ;

Liu, Fang ;

Liu, Xu ;

Li, Lingling ;

Chen, Puhua ;

Yang, Shuyuan .

IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 :9603-9618

[48] Poster: AsyncFedKD: Asynchronous Federated Learning with Knowledge Distillation [J].

Mohammed, Malik Naik ;

Zhang, Xinyue ;

Valero, Maria ;

Xie, Ying .

2023 IEEE/ACM CONFERENCE ON CONNECTED HEALTH: APPLICATIONS, SYSTEMS AND ENGINEERING TECHNOLOGIES, CHASE, 2023, :207-208

[49] Learning continuation: Integrating past knowledge for contrastive distillation [J].

Zhang, Bowen ;

Qin, Jiaohua ;

Xiang, Xuyu ;

Tan, Yun .

KNOWLEDGE-BASED SYSTEMS, 2024, 304

[50] Representation Learning and Knowledge Distillation for Lightweight Domain Adaptation [J].

Bin Shah, Sayed Rafay ;

Putty, Shreyas Subhash ;

Schwung, Andreas .

2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024, 2024, :1202-1207

← 1 2 3 4 5 →