Reward Learning for Efficient Reinforcement Learning in Extractive Document Summarisation

被引：0

作者：

Gao, Yang ^{[1
]}

Meyer, Christian M. ^{[2
]}

Mesgar, Mohsen ^{[2
]}

Gurevych, Iryna ^{[2
]}

机构：

[1] Royal Holloway Univ London, Dept Comp Sci, London, England

[2] Tech Univ Darmstadt, Ubiquitous Knowledge Proc Lab UKP TUDA, Darmstadt, Germany

来源：

PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2019年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Document summarisation can be formulated as a sequential decision-making problem, which can be solved by Reinforcement Learning (RL) algorithms. The predominant RL paradigm for summarisation learns a cross-input policy, which requires considerable time, data and parameter tuning due to the huge search spaces and the delayed rewards. Learning input-specific RL policies is a more efficient alternative but so far depends on handcrafted rewards, which are difficult to design and yield poor performance. We propose RELIS, a novel RL paradigm that learns a reward function with Learning-to-Rank (L2R) algorithms at training time and uses this reward function to train an input-specific RL policy at test time. We prove that RELIS guarantees to generate near-optimal summaries with appropriate L2R and RL algorithms. Empirically, we evaluate our approach on extractive multi-document summarisation. We show that RELIS reduces the training time by two orders of magnitude compared to the state-of-the-art models while performing on par with them.

引用

页码：2350 / 2356

页数：7

共 50 条

[1] Deep reinforcement learning for extractive document summarization
Yao, Kaichun
Zhang, Libo
Luo, Tiejian
Wu, Yanjun
NEUROCOMPUTING, 2018, 284 : 52 - 62
[2] RewardsOfSum: Exploring Reinforcement Learning Rewards for Summarisation
Parnell, Jacob
Unanue, Inigo Jauregi
Piccardi, Massimo
SPNLP 2021: THE 5TH WORKSHOP ON STRUCTURED PREDICTION FOR NLP, 2021, : 1 - 11
[3] Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning
Kong, Dingwen
Yang, Lin F.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[4] Uncertainty Estimation based Intrinsic Reward For Efficient Reinforcement Learning
Chen, Chao
Wan, Tianjiao
Shi, Peichang
Ding, Bo
Gao, Zijian
Feng, Dawei
2022 IEEE 13TH INTERNATIONAL CONFERENCE ON JOINT CLOUD COMPUTING (JCC 2022), 2022, : 1 - 8
[5] Information Directed Reward Learning for Reinforcement Learning
Lindner, David
Turchetta, Matteo
Tschiatschek, Sebastian
Ciosek, Kamil
Krause, Andreas
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[6] Reinforcement learning reward functions for unsupervised learning
Fyfe, Colin
Lai, Pei Ling
ADVANCES IN NEURAL NETWORKS - ISNN 2007, PT 1, PROCEEDINGS, 2007, 4491 : 397 - +
[7] Reward Reports for Reinforcement Learning
Gilbert, Thomas Krendl
Lambert, Nathan
Dean, Sarah
Zick, Tom
Snoswell, Aaron
Mehta, Soham
PROCEEDINGS OF THE 2023 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, AIES 2023, 2023, : 84 - 130
[8] Reward, motivation, and reinforcement learning
Dayan, P
Balleine, BW
NEURON, 2002, 36 (02) : 285 - 298
[9] Provably Efficient Offline Reinforcement Learning With Trajectory-Wise Reward
Xu, Tengyu
Wang, Yue
Zou, Shaofeng
Liang, Yingbin
IEEE TRANSACTIONS ON INFORMATION THEORY, 2024, 70 (09) : 6481 - 6518
[10] Efficient Average Reward Reinforcement Learning Using Constant Shifting Values
Yang, Shangdong
Gao, Yang
An, Bo
Wang, Hao
Chen, Xingguo
THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2258 - 2264

← 1 2 3 4 5 →