Reward estimation with scheduled knowledge distillation for dialogue policy learning

被引:2
作者
Qiu, Junyan [1 ]
Zhang, Haidong [2 ]
Yang, Yiping [2 ]
机构
[1] Univ Chinese Acad Sci, Beijing, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
关键词
Reinforcement learning; dialogue policy learning; curriculum learning; knowledge distillation;
D O I
10.1080/09540091.2023.2174078
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Formulating dialogue policy as a reinforcement learning (RL) task enables a dialogue system to act optimally by interacting with humans. However, typical RL-based methods normally suffer from challenges such as sparse and delayed reward problems. Besides, with user goal unavailable in real scenarios, the reward estimator is unable to generate reward reflecting action validity and task completion. Those issues may slow down and degrade the policy learning significantly. In this paper, we present a novel scheduled knowledge distillation framework for dialogue policy learning, which trains a compact student reward estimator by distilling the prior knowledge of user goals from a large teacher model. To further improve the stability of dialogue policy learning, we propose to leverage self-paced learning to arrange meaningful training order for the student reward estimator. Comprehensive experiments on Microsoft Dialogue Challenge and MultiWOZ datasets indicate that our approach significantly accelerates the learning speed, and the task-completion success rate can be improved from 0.47%similar to 9.01% compared with several strong baselines.
引用
收藏
页数:28
相关论文
共 50 条
  • [31] Smarter peer learning for online knowledge distillation
    Lin, Yu-e
    Liang, Xingzhu
    Hu, Gan
    Fang, Xianjin
    [J]. MULTIMEDIA SYSTEMS, 2022, 28 (03) : 1059 - 1067
  • [32] Positive-Unlabeled Learning for Knowledge Distillation
    Jiang, Ning
    Tang, Jialiang
    Yu, Wenxin
    [J]. NEURAL PROCESSING LETTERS, 2023, 55 (03) : 2613 - 2631
  • [33] FedDKD: Federated learning with decentralized knowledge distillation
    Xinjia Li
    Boyu Chen
    Wenlian Lu
    [J]. Applied Intelligence, 2023, 53 : 18547 - 18563
  • [34] Complementary label learning based on knowledge distillation
    Ying, Peng
    Li, Zhongnian
    Sun, Renke
    Xu, Xinzheng
    [J]. MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2023, 20 (10) : 17905 - 17918
  • [35] Smarter peer learning for online knowledge distillation
    Yu-e Lin
    Xingzhu Liang
    Gan Hu
    Xianjin Fang
    [J]. Multimedia Systems, 2022, 28 : 1059 - 1067
  • [36] Incremental attribute learning by knowledge distillation method
    Kuang, Zhejun
    Wang, Jingrui
    Sun, Dawen
    Zhao, Jian
    Shi, Lijuan
    Xiong, Xingbo
    [J]. JOURNAL OF COMPUTATIONAL DESIGN AND ENGINEERING, 2024, 11 (05) : 259 - 283
  • [37] Knowledge distillation for incremental learning in semantic segmentation
    Michieli, Umberto
    Zanuttigh, Pietro
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2021, 205
  • [38] Boosting LightWeight Depth Estimation via Knowledge Distillation
    Hu, Junjie
    Fan, Chenyou
    Jiang, Hualie
    Guo, Xiyue
    Gao, Yuan
    Lu, Xiangyong
    Lam, Tin Lun
    [J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT I, KSEM 2023, 2023, 14117 : 27 - 39
  • [39] Strabismus Detection Based on Uncertainty Estimation and Knowledge Distillation
    Rong, Yibiao
    Yang, Ziyin
    Zheng, Ce
    Fan, Zhun
    [J]. Journal of Beijing Institute of Technology (English Edition), 2024, 33 (05): : 399 - 411
  • [40] Learning Dialogue Policy Efficiently Through Dyna Proximal Policy Optimization
    Huang, Chenping
    Cao, Bin
    [J]. COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING, COLLABORATECOM 2022, PT I, 2022, 460 : 396 - 414