Reward estimation with scheduled knowledge distillation for dialogue policy learning

被引:2
|
作者
Qiu, Junyan [1 ]
Zhang, Haidong [2 ]
Yang, Yiping [2 ]
机构
[1] Univ Chinese Acad Sci, Beijing, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
关键词
Reinforcement learning; dialogue policy learning; curriculum learning; knowledge distillation;
D O I
10.1080/09540091.2023.2174078
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Formulating dialogue policy as a reinforcement learning (RL) task enables a dialogue system to act optimally by interacting with humans. However, typical RL-based methods normally suffer from challenges such as sparse and delayed reward problems. Besides, with user goal unavailable in real scenarios, the reward estimator is unable to generate reward reflecting action validity and task completion. Those issues may slow down and degrade the policy learning significantly. In this paper, we present a novel scheduled knowledge distillation framework for dialogue policy learning, which trains a compact student reward estimator by distilling the prior knowledge of user goals from a large teacher model. To further improve the stability of dialogue policy learning, we propose to leverage self-paced learning to arrange meaningful training order for the student reward estimator. Comprehensive experiments on Microsoft Dialogue Challenge and MultiWOZ datasets indicate that our approach significantly accelerates the learning speed, and the task-completion success rate can be improved from 0.47%similar to 9.01% compared with several strong baselines.
引用
收藏
页数:28
相关论文
共 50 条
  • [1] Reward estimation for dialogue policy optimisation
    Su, Pei-Hao
    Gasic, Milica
    Young, Steve
    COMPUTER SPEECH AND LANGUAGE, 2018, 51 : 24 - 43
  • [2] Domain-independent User Satisfaction Reward Estimation for Dialogue Policy Learning
    Ultes, Stefan
    Budzianowski, Pawel
    Casanueva, Inigo
    Mrksic, Nikola
    Rojas-Barahona, Lina
    Su, Pei-Hao
    Wen, Tsung-Hsien
    Gasic, Milica
    Young, Steve
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1721 - 1725
  • [3] On the Applicability of a User Satisfaction-Based Reward for Dialogue Policy Learning
    Ultes, Stefan
    Miehle, Juliana
    Minker, Wolfgang
    ADVANCED SOCIAL INTERACTION WITH AGENTS, 2019, 510 : 211 - 217
  • [4] Knowledge-Based Conversational Recommender Systems Enhanced by Dialogue Policy Learning
    Chen, Keyu
    Sun, Shiliang
    PROCEEDINGS OF THE 10TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE GRAPHS (IJCKG 2021), 2021, : 10 - 18
  • [5] HIERARCHICAL KNOWLEDGE DISTILLATION FOR DIALOGUE SEQUENCE LABELING
    Orihashi, Shota
    Yamazaki, Yoshihiro
    Makishima, Naoki
    Ihori, Mana
    Takashima, Akihiko
    Tanaka, Tomohiro
    Masumura, Ryo
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 433 - 440
  • [6] Skill enhancement learning with knowledge distillation
    Liu, Naijun
    Sun, Fuchun
    Fang, Bin
    Liu, Huaping
    SCIENCE CHINA-INFORMATION SCIENCES, 2024, 67 (08)
  • [7] Acquiring New Knowledge Without Losing Old Ones for Effective Continual Dialogue Policy Learning
    Wang, Huimin
    Zhang, Yunyan
    Yang, Yifan
    Zheng, Yefeng
    Wong, Kam-Fai
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (12) : 7569 - 7584
  • [8] EXPERT-BASED REWARD SHAPING AND EXPLORATION SCHEME FOR BOOSTING POLICY LEARNING OF DIALOGUE MANAGEMENT
    Ferreira, Emmanuel
    Lefevre, Fabrice
    2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 108 - 113
  • [9] Reinforced knowledge distillation: Multi-class imbalanced classifier based on policy gradient reinforcement learning
    Fan, Saite
    Zhang, Xinmin
    Song, Zhihuan
    NEUROCOMPUTING, 2021, 463 : 422 - 436
  • [10] Knowledge Distillation for Travel Time Estimation
    Zhang, Haichao
    Zhao, Fang
    Wang, Chenxing
    Luo, Haiyong
    Xiong, Haoyu
    Fang, Yuchen
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (08) : 9631 - 9642