Reward Learning for Efficient Reinforcement Learning in Extractive Document Summarisation

被引：0

作者：

Gao, Yang ^{[1
]}

Meyer, Christian M. ^{[2
]}

Mesgar, Mohsen ^{[2
]}

Gurevych, Iryna ^{[2
]}

机构：

[1] Royal Holloway Univ London, Dept Comp Sci, London, England

[2] Tech Univ Darmstadt, Ubiquitous Knowledge Proc Lab UKP TUDA, Darmstadt, Germany

来源：

PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2019年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Document summarisation can be formulated as a sequential decision-making problem, which can be solved by Reinforcement Learning (RL) algorithms. The predominant RL paradigm for summarisation learns a cross-input policy, which requires considerable time, data and parameter tuning due to the huge search spaces and the delayed rewards. Learning input-specific RL policies is a more efficient alternative but so far depends on handcrafted rewards, which are difficult to design and yield poor performance. We propose RELIS, a novel RL paradigm that learns a reward function with Learning-to-Rank (L2R) algorithms at training time and uses this reward function to train an input-specific RL policy at test time. We prove that RELIS guarantees to generate near-optimal summaries with appropriate L2R and RL algorithms. Empirically, we evaluate our approach on extractive multi-document summarisation. We show that RELIS reduces the training time by two orders of magnitude compared to the state-of-the-art models while performing on par with them.

引用

页码：2350 / 2356

页数：7

共 50 条

[41] Skill Reward for Safe Deep Reinforcement Learning
Cheng, Jiangchang
Yu, Fumin
Zhang, Hongliang
Dai, Yinglong
UBIQUITOUS SECURITY, 2022, 1557 : 203 - 213
[42] On the Power of Global Reward Signals in Reinforcement Learning
Kemmerich, Thomas
Buening, Hans Kleine
MULTIAGENT SYSTEM TECHNOLOGIES, 2011, 6973 : 53 - +
[43] Option compatible reward inverse reinforcement learning
Hwang, Rakhoon
Lee, Hanjin
Hwang, Hyung Ju
PATTERN RECOGNITION LETTERS, 2022, 154 : 83 - 89
[44] DISCRIMINATION OF REWARD IN LEARNING WITH PARTIAL AND CONTINUOUS REINFORCEMENT
HULSE, SH
JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 1962, 64 (03): : 227 - &
[45] Evolution of an Internal Reward Function for Reinforcement Learning
Zuo, Weiyi
Pedersen, Joachim Winther
Risi, Sebastian
PROCEEDINGS OF THE 2023 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION, GECCO 2023 COMPANION, 2023, : 351 - 354
[46] Reinforcement learning with nonstationary reward depending on the episode
Shibuya, Takeshi
Yasunobu, Seiji
2011 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2011, : 2145 - 2150
[47] Inverse Reinforcement Learning with the Average Reward Criterion
Wu, Feiyang
Ke, Jingyang
Wu, Anqi
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[48] THE ROLE OF SECONDARY REINFORCEMENT IN DELAYED REWARD LEARNING
SPENCE, KW
PSYCHOLOGICAL REVIEW, 1947, 54 (01) : 1 - 8
[49] Balancing multiple sources of reward in reinforcement learning
Shelton, CR
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 13, 2001, 13 : 1082 - 1088
[50] IMMEDIATE REINFORCEMENT IN DELAYED REWARD LEARNING IN PIGEONS
WINTER, J
PERKINS, CC
JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR, 1982, 38 (02) : 169 - 179

← 1 2 3 4 5 →