Evaluating Large Language Models on Controlled Generation Tasks

被引:0
|
作者
Sun, Jiao [1 ]
Tian, Yufei [2 ]
Zhou, Wangchunshu [3 ]
Xu, Nan [1 ]
Hu, Qian [4 ]
Gupta, Rahul [4 ]
Wieting, John [5 ]
Peng, Nanyun [2 ]
Ma, Xuezhe [1 ]
机构
[1] Univ Southern Calif, Los Angeles, CA 90007 USA
[2] Univ Calif Los Angeles, Los Angeles, CA 90024 USA
[3] Swiss Fed Inst Technol, Zurich, Switzerland
[4] Amazon, Seattle, WA USA
[5] Google DeepMind, London, England
来源
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023 | 2023年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While recent studies have looked into the abilities of large language models in various benchmark tasks, few studies have looked into the controllability of large language models on generation tasks. We present a systematic and extensive analysis of the controllability of large language models on ten benchmarks, including a new simple yet challenging numerical planning benchmark with different granularities. After comparing large language models against state-of-the-start finetuned smaller models, we present a spectrum showing when large language models fall behind, are comparable, or exceed the ability of smaller models. We conclude that large language models struggle at meeting fine-grained hard constraints.
引用
收藏
页码:3155 / 3168
页数:14
相关论文
共 50 条
  • [31] Java']JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models
    Cao, Jialun
    Chen, Zhiyong
    Wu, Jiarong
    Cheung, Shing-Chi
    Xu, Chang
    PROCEEDINGS OF 2024 39TH ACM/IEEE INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE 2024, 2024, : 870 - 882
  • [32] Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks
    Hakimov, Sherzod
    Schlangen, David
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 14196 - 14210
  • [33] EVALUATING LARGE LANGUAGE MODELS' (LLM) PERFORMANCE IN CONTENT GENERATION FOR GLOBAL VALUE DOSSIERS (GVD)
    Walters, J.
    Rtveladze, K.
    Xu, W.
    Green, N.
    Joseph, J.
    Matev, K.
    Gallinaro, J.
    Guerra, I
    VALUE IN HEALTH, 2024, 27 (12)
  • [34] Evaluating large language models for health-related text classification tasks with public social media data
    Guo, Yuting
    Ovadje, Anthony
    Al-Garadi, Mohammed Ali
    Sarker, Abeed
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (10) : 2181 - 2189
  • [35] Portuguese Language Models and Word Embeddings: Evaluating on Semantic Similarity Tasks
    Rodrigues, Ruan Chaves
    Rodrigues, Jessica
    Quinta de Castro, Pedro Vitor
    Felipe da Silva, Nadia Felix
    Soares, Anderson
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2020, 2020, 12037 : 239 - 248
  • [36] Towards an understanding of large language models in software engineering tasks
    Zheng, Zibin
    Ning, Kaiwen
    Zhong, Qingyuan
    Chen, Jiachi
    Chen, Wenqing
    Guo, Lianghong
    Wang, Weicheng
    Wang, Yanlin
    EMPIRICAL SOFTWARE ENGINEERING, 2025, 30 (02)
  • [37] Multimodal large language models for inclusive collaboration learning tasks
    Lewis, Armanda
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2022, : 202 - 210
  • [38] Reasoning with Large Language Models on Graph Tasks: The Influence of Temperature
    Wang, Yiming
    Zhang, Ziyang
    Chen, Hanwei
    Shen, Huayi
    2024 5TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND APPLICATION, ICCEA 2024, 2024, : 630 - 634
  • [39] Evaluating Multimedia and Language Tasks
    Soboroff, Ian
    Awad, George
    Butt, Asad
    Curtis, Keith
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2020, 3
  • [40] Challenges in applying large language models to requirements engineering tasks
    Norheim, Johannes J.
    Rebentisch, Eric
    Xiao, Dekai
    Draeger, Lorenz
    Kerbrat, Alain
    de Weck, Olivier L.
    DESIGN SCIENCE, 2024, 10