Evaluating Large Language Models on Controlled Generation Tasks

被引:0
|
作者
Sun, Jiao [1 ]
Tian, Yufei [2 ]
Zhou, Wangchunshu [3 ]
Xu, Nan [1 ]
Hu, Qian [4 ]
Gupta, Rahul [4 ]
Wieting, John [5 ]
Peng, Nanyun [2 ]
Ma, Xuezhe [1 ]
机构
[1] Univ Southern Calif, Los Angeles, CA 90007 USA
[2] Univ Calif Los Angeles, Los Angeles, CA 90024 USA
[3] Swiss Fed Inst Technol, Zurich, Switzerland
[4] Amazon, Seattle, WA USA
[5] Google DeepMind, London, England
来源
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023 | 2023年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While recent studies have looked into the abilities of large language models in various benchmark tasks, few studies have looked into the controllability of large language models on generation tasks. We present a systematic and extensive analysis of the controllability of large language models on ten benchmarks, including a new simple yet challenging numerical planning benchmark with different granularities. After comparing large language models against state-of-the-start finetuned smaller models, we present a spectrum showing when large language models fall behind, are comparable, or exceed the ability of smaller models. We conclude that large language models struggle at meeting fine-grained hard constraints.
引用
收藏
页码:3155 / 3168
页数:14
相关论文
共 50 条
  • [41] Game Generation via Large Language Models
    Hu, Chengpeng
    Zhao, Yunlong
    Liu, Jialin
    2024 IEEE CONFERENCE ON GAMES, COG 2024, 2024,
  • [42] Evaluation of Pretrained Large Language Models in Embodied Planning Tasks
    Sarkisyan, Christina
    Korchemnyi, Alexandr
    Kovalev, Alexey K.
    Panov, Aleksandr, I
    ARTIFICIAL GENERAL INTELLIGENCE, AGI 2023, 2023, 13921 : 222 - 232
  • [43] Level Generation Through Large Language Models
    Todd, Graham
    Earle, Sam
    Nasir, Muhammad Umair
    Green, Michael Cerny
    Togelius, Julian
    PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON THE FOUNDATIONS OF DIGITAL GAMES, FDG 2023, 2023,
  • [44] On the Capacity of Citation Generation by Large Language Models
    Qian, Haosheng
    Fan, Yixing
    Zhang, Ruqing
    Guo, Jiafeng
    INFORMATION RETRIEVAL, CCIR 2024, 2025, 15418 : 109 - 123
  • [45] Retrieval augmentation of large language models for lay language generation
    Guo, Yue
    Qiu, Wei
    Leroy, Gondy
    Wang, Sheng
    Cohen, Trevor
    JOURNAL OF BIOMEDICAL INFORMATICS, 2024, 149
  • [46] Retrieval augmentation of large language models for lay language generation
    Guo, Yue
    Qiu, Wei
    Leroy, Gondy
    Wang, Sheng
    Cohen, Trevor
    Journal of Biomedical Informatics, 2024, 149
  • [47] Evaluating Large Language Models for Tax Law Reasoning
    Cavalcante Presa, Joao Paulo
    Camilo Junior, Celso Goncalves
    Teles de Oliveira, Savio Salvarino
    INTELLIGENT SYSTEMS, BRACIS 2024, PT I, 2025, 15412 : 460 - 474
  • [48] Evaluating alignment in large language models: a review of methodologies
    Uma E. Sarkar
    AI and Ethics, 2025, 5 (3): : 3233 - 3240
  • [49] A Chinese Dataset for Evaluating the Safeguards in Large Language Models
    Wang, Yuxia
    Zhai, Zenan
    Li, Haonan
    Han, Xudong
    Lin, Lizhi
    Zhang, Zhenxuan
    Zhao, Jingru
    Nakov, Preslav
    Baldwin, Timothy
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 3106 - 3119
  • [50] EconNLI: Evaluating Large Language Models on Economics Reasoning
    Guo, Yue
    Yang, Yi
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 982 - 994