Quark: Controllable Text Generation with Reinforced [Un]learning

被引:0
|
作者
Lu, Ximing [1 ,2 ]
Welleck, Sean [1 ,2 ]
Hessel, Jack [1 ]
Jiang, Liwei [1 ,2 ]
Qin, Lianhui [2 ]
West, Peter [2 ]
Ammanabrolu, Prithviraj [1 ]
Choi, Yejin [1 ,2 ]
机构
[1] Allen Inst Artificial Intelligence, Seattle, WA 98103 USA
[2] Univ Washington, Paul G Allen Sch Comp Sci, Seattle, WA 98195 USA
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large-scale language models often learn behaviors that are misaligned with user expectations. Generated text may contain offensive or toxic language, contain significant repetition, or be of a different sentiment than desired by the user. We consider the task of unlearning these misalignments by fine-tuning the language model on signals of what not to do. We introduce Quantized Reward Konditioning (Quark), an algorithm for optimizing a reward function that quantifies an (un)wanted property, while not straying too far from the original model. Quark alternates between (i) collecting samples with the current language model, (ii) sorting them into quantiles based on reward, with each quantile identified by a reward token prepended to the language model's input, and (iii) using a standard language modeling loss on samples from each quantile conditioned on its reward token, while remaining nearby the original language model via a KL-divergence penalty. By conditioning on a high-reward token at generation time, the model generates text that exhibits less of the unwanted property. For unlearning toxicity, negative sentiment, and repetition, our experiments show that Quark outperforms both strong baselines and state-of-the-art reinforcement learning methods like PPO [66], while relying only on standard language modeling primitives.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Adversarial Imitation Learning with Controllable Rewards for Text Generation
    Nishikino, Keizaburo
    Kobayashi, Kenichi
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT I, 2023, 14169 : 131 - 146
  • [2] CLICK: Controllable Text Generation with Sequence Likelihood Contrastive Learning
    Zheng, Chujie
    Ke, Pei
    Zhang, Zheng
    Huang, Minlie
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 1022 - 1040
  • [3] A Causal Lens for Controllable Text Generation
    Hu, Zhiting
    Li, Li Erran
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [4] Controllable Text-to-Image Generation
    Li, Bowen
    Qi, Xiaojuan
    Lukasiewicz, Thomas
    Torr, Philip H. S.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [5] Controllable Text Generation with Focused Variation
    Shu, Lei
    Papangelis, Alexandros
    Wang, Yi-Chia
    Tur, Gokhan
    Xu, Hu
    Feizollahi, Zhaleh
    Liu, Bing
    Molino, Piero
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 3805 - 3817
  • [6] Controllable Text Layout Generation For Synthesizing Scene Text Image
    Chen, Huen
    He, Jiangyang
    Zhu, Anna
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT V, 2024, 14808 : 147 - 161
  • [7] XFBoost: Improving Text Generation with Controllable Decoders
    Peng, Xiangyu
    Sollami, Michael
    arXiv, 2022,
  • [8] Focused Prefix Tuning for Controllable Text Generation
    Ma, Congda
    Zhao, Tianyu
    Shing, Makoto
    Sawada, Kei
    Okumura, Manabu
    61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 1116 - 1127
  • [9] Intent-Controllable Citation Text Generation
    Jung, Shing-Yun
    Lin, Ting-Han
    Liao, Chia-Hung
    Yuan, Shyan-Ming
    Sun, Chuen-Tsai
    MATHEMATICS, 2022, 10 (10)
  • [10] Controllable Text Generation with Residual Memory Transformer
    Zhang, Hanqing
    Sun, Si
    Wu, Haiming
    Song, Dawei
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 1048 - 1066