Quark: Controllable Text Generation with Reinforced [Un]learning

被引：0

作者：

Lu, Ximing ^{[1
,2
]}

Welleck, Sean ^{[1
,2
]}

Hessel, Jack ^{[1
]}

Jiang, Liwei ^{[1
,2
]}

Qin, Lianhui ^{[2
]}

West, Peter ^{[2
]}

Ammanabrolu, Prithviraj ^{[1
]}

Choi, Yejin ^{[1
,2
]}

机构：

[1] Allen Inst Artificial Intelligence, Seattle, WA 98103 USA

[2] Univ Washington, Paul G Allen Sch Comp Sci, Seattle, WA 98195 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年

基金：

加拿大自然科学与工程研究理事会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large-scale language models often learn behaviors that are misaligned with user expectations. Generated text may contain offensive or toxic language, contain significant repetition, or be of a different sentiment than desired by the user. We consider the task of unlearning these misalignments by fine-tuning the language model on signals of what not to do. We introduce Quantized Reward Konditioning (Quark), an algorithm for optimizing a reward function that quantifies an (un)wanted property, while not straying too far from the original model. Quark alternates between (i) collecting samples with the current language model, (ii) sorting them into quantiles based on reward, with each quantile identified by a reward token prepended to the language model's input, and (iii) using a standard language modeling loss on samples from each quantile conditioned on its reward token, while remaining nearby the original language model via a KL-divergence penalty. By conditioning on a high-reward token at generation time, the model generates text that exhibits less of the unwanted property. For unlearning toxicity, negative sentiment, and repetition, our experiments show that Quark outperforms both strong baselines and state-of-the-art reinforcement learning methods like PPO [66], while relying only on standard language modeling primitives.

引用

页数：19

共 50 条

[1] Adversarial Imitation Learning with Controllable Rewards for Text Generation
Nishikino, Keizaburo
Kobayashi, Kenichi
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT I, 2023, 14169 : 131 - 146
[2] CLICK: Controllable Text Generation with Sequence Likelihood Contrastive Learning
Zheng, Chujie
Ke, Pei
Zhang, Zheng
Huang, Minlie
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 1022 - 1040
[3] A Causal Lens for Controllable Text Generation
Hu, Zhiting
Li, Li Erran
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[4] Controllable Text-to-Image Generation
Li, Bowen
Qi, Xiaojuan
Lukasiewicz, Thomas
Torr, Philip H. S.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[5] Controllable Text Generation with Focused Variation
Shu, Lei
Papangelis, Alexandros
Wang, Yi-Chia
Tur, Gokhan
Xu, Hu
Feizollahi, Zhaleh
Liu, Bing
Molino, Piero
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 3805 - 3817
[6] Controllable Text Layout Generation For Synthesizing Scene Text Image
Chen, Huen
He, Jiangyang
Zhu, Anna
DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT V, 2024, 14808 : 147 - 161
[7] XFBoost: Improving Text Generation with Controllable Decoders
Peng, Xiangyu
Sollami, Michael
arXiv, 2022,
[8] Focused Prefix Tuning for Controllable Text Generation
Ma, Congda
Zhao, Tianyu
Shing, Makoto
Sawada, Kei
Okumura, Manabu
61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 1116 - 1127
[9] Intent-Controllable Citation Text Generation
Jung, Shing-Yun
Lin, Ting-Han
Liao, Chia-Hung
Yuan, Shyan-Ming
Sun, Chuen-Tsai
MATHEMATICS, 2022, 10 (10)
[10] Controllable Text Generation with Residual Memory Transformer
Zhang, Hanqing
Sun, Si
Wu, Haiming
Song, Dawei
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 1048 - 1066

← 1 2 3 4 5 →