NANO: Nested Human-in-the-Loop Reward Learning for Few-shot Language Model Control

被引：0

作者：

Fan, Xiang ^{[1
]}

Lyu, Yiwei ^{[2
]}

Liang, Paul Pu ^{[1
]}

Salakhutdinov, Ruslan ^{[1
]}

Morency, Louis-Philippe ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

[2] Univ Michigan, Ann Arbor, MI USA

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023) | 2023年

基金：

美国国家科学基金会; 美国安德鲁·梅隆基金会; 美国国家卫生研究院;

关键词：

NATURAL-LANGUAGE; GENERATION; IF;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pretrained language models have demonstrated extraordinary capabilities in language generation. However, real-world tasks often require controlling the distribution of generated text in order to mitigate bias, promote fairness, and achieve personalization. Existing techniques for controlling the distribution of generated text only work with quantified distributions, which require pre-defined categories, proportions of the distribution, or an existing corpus following the desired distributions. However, many important distributions, such as personal preferences, are unquantified. In this work, we tackle the problem of generating text following arbitrary distributions (quantified and unquantified) by proposing NANO, a few-shot human-in-the-loop training algorithm that continuously learns from human feedback. NANO achieves state-of-the-art results on single topic/attribute as well as quantified distribution control compared to previous works. We also show that NANO is able to learn unquantified distributions, achieves personalization, and captures differences between different individuals' personal preferences with high sample efficiency. Our code is available at https://github.com/sfanxiang/Nano.

引用

页码：11970 / 11992

页数：23

共 59 条

[1] Deep reinforcement and transfer learning for abstractive text summarization: A review [J].

Alomari, Ayham ;

Idris, Norisma ;

Sabri, Aznul Qalid Md ;

Alsmadi, Izzat .

COMPUTER SPEECH AND LANGUAGE, 2022, 71

[2]

[Anonymous], 2019, ARXIV, DOI DOI 10.1109/CVPR.2019.01240

[3]

[Anonymous], 2019, ACL

[4]

Arous I, 2021, AAAI CONF ARTIF INTE, V35, P5868

[5]

Baldini I, 2022, FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), P2245

[6] On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? [J].

Bender, Emily M. ;

Gebru, Timnit ;

McMillan-Major, Angelina ;

Shmitchell, Shmargaret .

PROCEEDINGS OF THE 2021 ACM CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, FACCT 2021, 2021, :610-623

[7]

Bordia S, 2019, NAACL HLT 2019: THE 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, P7

[8]

Brown TB, 2020, ADV NEUR IN, V33

[9] A reinforcement learning formulation to the complex question answering problem [J].

Chali, Yllias ;

Hasan, Sadid A. ;

Mojahid, Mustapha .

INFORMATION PROCESSING & MANAGEMENT, 2015, 51 (03) :252-272

[10]

Chan A. J., 2021, INT C LEARN REPR

← 1 2 3 4 5 6 →