LLM4Eval: Large Language Model for Evaluation in IR

被引:1
|
作者
Rahmani, Hossein A. [1 ]
Siro, Clemencia [2 ]
Aliannejadi, Mohammad [2 ]
Craswell, Nick [3 ]
Clarke, Charles L. A. [4 ]
Faggioli, Guglielmo [5 ]
Mitra, Bhaskar [6 ]
Thomas, Paul [7 ]
Yilmaz, Emine [1 ,8 ]
机构
[1] UCL, London, England
[2] Univ Amsterdam, Amsterdam, Netherlands
[3] Microsoft, Bellevue, WA USA
[4] Univ Waterloo, Waterloo, ON, Canada
[5] Univ Padua, Padua, Italy
[6] Microsoft, Montreal, PQ, Canada
[7] Microsoft, Adelaide, SA, Australia
[8] Amazon, London, England
来源
PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024 | 2024年
基金
英国工程与自然科学研究理事会;
关键词
Generative Models; Large Language Models; Automated Evaluation;
D O I
10.1145/3626772.3657992
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models (LLMs) have demonstrated increasing task-solving abilities not present in smaller models. Utilizing the capabilities and responsibilities of LLMs for automated evaluation (LLM4Eval) has recently attracted considerable attention in multiple research communities. For instance, LLM4Eval models have been studied in the context of automated judgments, natural language generation, and retrieval augmented generation systems. We believe that the information retrieval community can significantly contribute to this growing research area by designing, implementing, analyzing, and evaluating various aspects of LLMs with applications to LLM4Eval tasks. The main goal of LLM4Eval workshop is to bring together researchers from industry and academia to discuss various aspects of LLMs for evaluation in information retrieval, including automated judgments, retrieval-augmented generation pipeline evaluation, altering human evaluation, robustness, and trustworthiness of LLMs for evaluation in addition to their impact on real-world applications. We also plan to run an automated judgment challenge prior to the workshop, where participants will be asked to generate labels for a given dataset while maximising correlation with human judgments. The format of the workshop is interactive, including roundtable and keynote sessions and tends to avoid the one-sided dialogue of a mini-conference.
引用
收藏
页码:3040 / 3043
页数:4
相关论文
共 50 条
  • [1] ChatGPT and large language model (LLM) chatbots: Correspondence
    Kleebayoon, Amnuay
    Wiwanitkit, Viroj
    JOURNAL OF PEDIATRIC UROLOGY, 2023, 19 (05) : 605 - 606
  • [2] Evaluation of a novel large language model (LLM)-powered chatbot for oral boards scenarios
    Caitlin Silvestri
    Joshua Roshal
    Meghal Shah
    Warren D. Widmann
    Courtney Townsend
    Riley Brian
    Joseph C. L’Huillier
    Sergio M. Navarro
    Sarah Lund
    Tejas S. Sathe
    Global Surgical Education - Journal of the Association for Surgical Education, 3 (1):
  • [3] Use of a large language model (LLM) for ambulance dispatch and triage
    Shekhar, Aditya C.
    Kimbrell, Joshua
    Saharan, Aaryan
    Stebel, Jacob
    Ashley, Evan
    Abbott, Ethan E.
    AMERICAN JOURNAL OF EMERGENCY MEDICINE, 2025, 89 : 27 - 29
  • [4] LLM in a flash: Efficient Large Language Model Inference with Limited Memory
    Alizadeh, Keivan
    Mirzadeh, Iman
    Belenko, Dmitry
    Khatamifard, S. Karen
    Cho, Minsik
    Del Mundo, Carlo C.
    Rastegari, Mohammad
    Farajtabar, Mehrdad
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 12562 - 12584
  • [5] A systematic review of large language model (LLM) evaluations in clinical medicine
    Sina Shool
    Sara Adimi
    Reza Saboori Amleshi
    Ehsan Bitaraf
    Reza Golpira
    Mahmood Tara
    BMC Medical Informatics and Decision Making, 25 (1)
  • [6] LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model
    Luo, Yulin
    An, Ruichuan
    Zou, Bocheng
    Tang, Yiming
    Liu, Jiaming
    Zhang, Shanghang
    COMPUTER VISION - ECCV 2024, PT XXXIII, 2025, 15091 : 235 - 252
  • [7] Large Language Model (LLM) for Standard Cell Layout Design Optimization
    Ho, Chia-Tung
    Ren, Haoxing
    2024 IEEE LLM AIDED DESIGN WORKSHOP, LAD 2024, 2024,
  • [8] Smart Contract Vulnerability Detection: The Role of Large Language Model (LLM)
    Boi, Biagio
    Esposito, Christian
    Lee, Sokjoon
    APPLIED COMPUTING REVIEW, 2024, 24 (02): : 19 - 29
  • [9] LLM4SecHW: Leveraging Domain-Specific Large Language Model for Hardware Debugging
    Fu, Weimin
    Yang, Kaichen
    Dutta, Raj Gautam
    Guo, Xiaolong
    Qu, Gang
    2023 ASIAN HARDWARE ORIENTED SECURITY AND TRUST SYMPOSIUM, ASIANHOST, 2023,
  • [10] LLM4QA: Leveraging Large Language Model for Efficient Knowledge Graph Reasoning with SPARQL Query
    Lan, Mingjing
    Xia, Yi
    Zhou, Gang
    Huang, Ningbo
    Li, Zhufeng
    Wu, Hao
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2024, 15 (10) : 1157 - 1162