Reward Learning for Efficient Reinforcement Learning in Extractive Document Summarisation

被引:0
|
作者
Gao, Yang [1 ]
Meyer, Christian M. [2 ]
Mesgar, Mohsen [2 ]
Gurevych, Iryna [2 ]
机构
[1] Royal Holloway Univ London, Dept Comp Sci, London, England
[2] Tech Univ Darmstadt, Ubiquitous Knowledge Proc Lab UKP TUDA, Darmstadt, Germany
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Document summarisation can be formulated as a sequential decision-making problem, which can be solved by Reinforcement Learning (RL) algorithms. The predominant RL paradigm for summarisation learns a cross-input policy, which requires considerable time, data and parameter tuning due to the huge search spaces and the delayed rewards. Learning input-specific RL policies is a more efficient alternative but so far depends on handcrafted rewards, which are difficult to design and yield poor performance. We propose RELIS, a novel RL paradigm that learns a reward function with Learning-to-Rank (L2R) algorithms at training time and uses this reward function to train an input-specific RL policy at test time. We prove that RELIS guarantees to generate near-optimal summaries with appropriate L2R and RL algorithms. Empirically, we evaluate our approach on extractive multi-document summarisation. We show that RELIS reduces the training time by two orders of magnitude compared to the state-of-the-art models while performing on par with them.
引用
收藏
页码:2350 / 2356
页数:7
相关论文
共 50 条
  • [1] Deep reinforcement learning for extractive document summarization
    Yao, Kaichun
    Zhang, Libo
    Luo, Tiejian
    Wu, Yanjun
    NEUROCOMPUTING, 2018, 284 : 52 - 62
  • [2] RewardsOfSum: Exploring Reinforcement Learning Rewards for Summarisation
    Parnell, Jacob
    Unanue, Inigo Jauregi
    Piccardi, Massimo
    SPNLP 2021: THE 5TH WORKSHOP ON STRUCTURED PREDICTION FOR NLP, 2021, : 1 - 11
  • [3] Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning
    Kong, Dingwen
    Yang, Lin F.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [4] Uncertainty Estimation based Intrinsic Reward For Efficient Reinforcement Learning
    Chen, Chao
    Wan, Tianjiao
    Shi, Peichang
    Ding, Bo
    Gao, Zijian
    Feng, Dawei
    2022 IEEE 13TH INTERNATIONAL CONFERENCE ON JOINT CLOUD COMPUTING (JCC 2022), 2022, : 1 - 8
  • [5] Information Directed Reward Learning for Reinforcement Learning
    Lindner, David
    Turchetta, Matteo
    Tschiatschek, Sebastian
    Ciosek, Kamil
    Krause, Andreas
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [6] Reinforcement learning reward functions for unsupervised learning
    Fyfe, Colin
    Lai, Pei Ling
    ADVANCES IN NEURAL NETWORKS - ISNN 2007, PT 1, PROCEEDINGS, 2007, 4491 : 397 - +
  • [7] Reward Reports for Reinforcement Learning
    Gilbert, Thomas Krendl
    Lambert, Nathan
    Dean, Sarah
    Zick, Tom
    Snoswell, Aaron
    Mehta, Soham
    PROCEEDINGS OF THE 2023 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, AIES 2023, 2023, : 84 - 130
  • [8] Reward, motivation, and reinforcement learning
    Dayan, P
    Balleine, BW
    NEURON, 2002, 36 (02) : 285 - 298
  • [9] Provably Efficient Offline Reinforcement Learning With Trajectory-Wise Reward
    Xu, Tengyu
    Wang, Yue
    Zou, Shaofeng
    Liang, Yingbin
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2024, 70 (09) : 6481 - 6518
  • [10] Efficient Average Reward Reinforcement Learning Using Constant Shifting Values
    Yang, Shangdong
    Gao, Yang
    An, Bo
    Wang, Hao
    Chen, Xingguo
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2258 - 2264