Reward Learning for Efficient Reinforcement Learning in Extractive Document Summarisation

被引:0
|
作者
Gao, Yang [1 ]
Meyer, Christian M. [2 ]
Mesgar, Mohsen [2 ]
Gurevych, Iryna [2 ]
机构
[1] Royal Holloway Univ London, Dept Comp Sci, London, England
[2] Tech Univ Darmstadt, Ubiquitous Knowledge Proc Lab UKP TUDA, Darmstadt, Germany
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Document summarisation can be formulated as a sequential decision-making problem, which can be solved by Reinforcement Learning (RL) algorithms. The predominant RL paradigm for summarisation learns a cross-input policy, which requires considerable time, data and parameter tuning due to the huge search spaces and the delayed rewards. Learning input-specific RL policies is a more efficient alternative but so far depends on handcrafted rewards, which are difficult to design and yield poor performance. We propose RELIS, a novel RL paradigm that learns a reward function with Learning-to-Rank (L2R) algorithms at training time and uses this reward function to train an input-specific RL policy at test time. We prove that RELIS guarantees to generate near-optimal summaries with appropriate L2R and RL algorithms. Empirically, we evaluate our approach on extractive multi-document summarisation. We show that RELIS reduces the training time by two orders of magnitude compared to the state-of-the-art models while performing on par with them.
引用
收藏
页码:2350 / 2356
页数:7
相关论文
共 50 条
  • [41] Skill Reward for Safe Deep Reinforcement Learning
    Cheng, Jiangchang
    Yu, Fumin
    Zhang, Hongliang
    Dai, Yinglong
    UBIQUITOUS SECURITY, 2022, 1557 : 203 - 213
  • [42] On the Power of Global Reward Signals in Reinforcement Learning
    Kemmerich, Thomas
    Buening, Hans Kleine
    MULTIAGENT SYSTEM TECHNOLOGIES, 2011, 6973 : 53 - +
  • [43] Option compatible reward inverse reinforcement learning
    Hwang, Rakhoon
    Lee, Hanjin
    Hwang, Hyung Ju
    PATTERN RECOGNITION LETTERS, 2022, 154 : 83 - 89
  • [44] DISCRIMINATION OF REWARD IN LEARNING WITH PARTIAL AND CONTINUOUS REINFORCEMENT
    HULSE, SH
    JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 1962, 64 (03): : 227 - &
  • [45] Evolution of an Internal Reward Function for Reinforcement Learning
    Zuo, Weiyi
    Pedersen, Joachim Winther
    Risi, Sebastian
    PROCEEDINGS OF THE 2023 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION, GECCO 2023 COMPANION, 2023, : 351 - 354
  • [46] Reinforcement learning with nonstationary reward depending on the episode
    Shibuya, Takeshi
    Yasunobu, Seiji
    2011 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2011, : 2145 - 2150
  • [47] Inverse Reinforcement Learning with the Average Reward Criterion
    Wu, Feiyang
    Ke, Jingyang
    Wu, Anqi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [48] THE ROLE OF SECONDARY REINFORCEMENT IN DELAYED REWARD LEARNING
    SPENCE, KW
    PSYCHOLOGICAL REVIEW, 1947, 54 (01) : 1 - 8
  • [49] Balancing multiple sources of reward in reinforcement learning
    Shelton, CR
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 13, 2001, 13 : 1082 - 1088
  • [50] IMMEDIATE REINFORCEMENT IN DELAYED REWARD LEARNING IN PIGEONS
    WINTER, J
    PERKINS, CC
    JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR, 1982, 38 (02) : 169 - 179