Reward estimation with scheduled knowledge distillation for dialogue policy learning

被引:2
作者
Qiu, Junyan [1 ]
Zhang, Haidong [2 ]
Yang, Yiping [2 ]
机构
[1] Univ Chinese Acad Sci, Beijing, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
关键词
Reinforcement learning; dialogue policy learning; curriculum learning; knowledge distillation;
D O I
10.1080/09540091.2023.2174078
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Formulating dialogue policy as a reinforcement learning (RL) task enables a dialogue system to act optimally by interacting with humans. However, typical RL-based methods normally suffer from challenges such as sparse and delayed reward problems. Besides, with user goal unavailable in real scenarios, the reward estimator is unable to generate reward reflecting action validity and task completion. Those issues may slow down and degrade the policy learning significantly. In this paper, we present a novel scheduled knowledge distillation framework for dialogue policy learning, which trains a compact student reward estimator by distilling the prior knowledge of user goals from a large teacher model. To further improve the stability of dialogue policy learning, we propose to leverage self-paced learning to arrange meaningful training order for the student reward estimator. Comprehensive experiments on Microsoft Dialogue Challenge and MultiWOZ datasets indicate that our approach significantly accelerates the learning speed, and the task-completion success rate can be improved from 0.47%similar to 9.01% compared with several strong baselines.
引用
收藏
页数:28
相关论文
共 50 条
  • [21] Continual Learning Based on Knowledge Distillation and Representation Learning
    Chen, Xiu-Yan
    Liu, Jian-Wei
    Li, Wen-Tao
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT IV, 2022, 13532 : 27 - 38
  • [22] Deep Learning-Based Eye Gaze Estimation for Automotive Applications Using Knowledge Distillation
    Orasan, Ioan Lucan
    Bublea, Adrian-Ioan
    Caleanu, Catalin Daniel
    IEEE ACCESS, 2023, 11 : 120741 - 120753
  • [23] Evaluation of Online Dialogue Policy Learning Techniques
    Papangelis, Alexandros
    Karkaletsis, Vangelis
    Makedon, Fillia
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1410 - 1415
  • [24] Personalized Decentralized Federated Learning with Knowledge Distillation
    Jeong, Eunjeong
    Kountouris, Marios
    ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, : 1982 - 1987
  • [25] Heterogeneous Knowledge Distillation Using Conceptual Learning
    Yu, Yerin
    Kim, Namgyu
    IEEE ACCESS, 2024, 12 : 52803 - 52814
  • [26] Federated Learning Algorithm Based on Knowledge Distillation
    Jiang, Donglin
    Shan, Chen
    Zhang, Zhihui
    2020 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER ENGINEERING (ICAICE 2020), 2020, : 163 - 167
  • [27] Smarter peer learning for online knowledge distillation
    Lin, Yu-e
    Liang, Xingzhu
    Hu, Gan
    Fang, Xianjin
    MULTIMEDIA SYSTEMS, 2022, 28 (03) : 1059 - 1067
  • [28] FedDKD: Federated learning with decentralized knowledge distillation
    Xinjia Li
    Boyu Chen
    Wenlian Lu
    Applied Intelligence, 2023, 53 : 18547 - 18563
  • [29] Positive-Unlabeled Learning for Knowledge Distillation
    Jiang, Ning
    Tang, Jialiang
    Yu, Wenxin
    NEURAL PROCESSING LETTERS, 2023, 55 (03) : 2613 - 2631
  • [30] Knowledge distillation in deep learning and its applications
    Alkhulaifi, Abdolmaged
    Alsahli, Fahad
    Ahmad, Irfan
    PEERJ COMPUTER SCIENCE, 2021, PeerJ Inc. (07) : 1 - 24