Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study

被引:15
|
作者
Sui, Yuan [1 ,4 ]
Zhou, Mengyu [2 ]
Zhou, Mingjie [3 ,4 ]
Han, Shi [2 ]
Zhang, Dongmei [2 ]
机构
[1] Natl Univ Singapore, Singapore, Singapore
[2] Microsoft, Beijing, Peoples R China
[3] Univ Hong Kong, Hong Kong, Peoples R China
[4] Microsoft Res Asia, Beijing, Peoples R China
来源
PROCEEDINGS OF THE 17TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, WSDM 2024 | 2024年
关键词
large language models; semi-structured data; structural understanding capabilities; benchmark;
D O I
10.1145/3616855.3635752
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models (LLMs) are becoming attractive as few-shot reasoners to solve Natural Language (NL)-related tasks. However, there is still much to learn about how well LLMs understand structured data, such as tables. Although tables can be used as input to LLMs with serialization, there is a lack of comprehensive studies that examine whether LLMs can truly comprehend such data. In this paper, we try to understand this by designing a benchmark to evaluate the structural understanding capabilities (SUC) of LLMs. The benchmark we create includes seven tasks, each with its own unique challenges, e.g., cell lookup, row retrieval, and size detection. We perform a series of evaluations on GPT-3.5 and GPT-4. We find that performance varied depending on several input choices, including table input format, content order, role prompting, and partition marks. Drawing from the insights gained through the benchmark evaluations, we propose self-augmentation for effective structural prompting, such as critical value / range identification using internal knowledge of LLMs. When combined with carefully chosen input choices, these structural prompting methods lead to promising improvements in LLM performance on a variety of tabular tasks, e.g., TabFact(. 2.31%), HybridQA(. 2.13%), SQA(. 2.72%), Feverous(. 0.84%), and ToTTo(. 5.68%). We believe that our open source1 benchmark and proposed prompting methods can serve as a simple yet generic selection for future research.
引用
收藏
页码:645 / 654
页数:10
相关论文
共 34 条
  • [21] The interaction of structured data using openEHR and large Language models for clinical decision support in prostate cancer
    Kaiser, Philippe
    Yang, Shan
    Bach, Michael
    Breit, Christian
    Mertz, Kirsten
    Stieltjes, Bram
    Ebbing, Jan
    Wetterauer, Christian
    Henkel, Maurice
    WORLD JOURNAL OF UROLOGY, 2025, 43 (01)
  • [22] An Empirical Study on Using Large Language Models to Analyze Software Supply Chain Security Failures
    Singla, Tanmay
    Anandayuvaraj, Dharun
    Kalu, Kelechi G.
    Schorlemmer, Taylor R.
    Davis, James C.
    PROCEEDINGS OF THE 2023 WORKSHOP ON SOFTWARE SUPPLY CHAIN OFFENSIVE RESEARCH AND ECOSYSTEM DEFENSES, SCORED 2023, 2023, : 5 - 15
  • [23] LLM-Commentator: Novel fine-tuning strategies of large language models for automatic commentary generation using football event data
    Cook, Alec
    Karakul, Oktay
    KNOWLEDGE-BASED SYSTEMS, 2024, 300
  • [24] Integrating Large Language Models and Optimization in Semi- Structured Decision Making: Methodology and a Case Study
    Ghiani, Gianpaolo
    Solazzo, Gianluca
    Elia, Gianluca
    ALGORITHMS, 2024, 17 (12)
  • [25] Can large language models help predict results from a complex behavioural science study?
    Lippert, Steffen
    Dreber, Anna
    Johannesson, Magnus
    Tierney, Warren
    Cyrus-Lai, Wilson
    Uhlmann, Eric Luis
    Pfeiffer, Thomas
    ROYAL SOCIETY OPEN SCIENCE, 2024, 11 (09):
  • [26] Utilizing large language models for detecting hospital-acquired conditions: an empirical study on pulmonary embolism
    Cheligeer, Cheligeer
    Southern, Danielle A.
    Yan, Jun
    Wu, Guosong
    Pan, Jie
    Lee, Seungwon
    Martin, Elliot A.
    Jafarpour, Hamed
    Eastwood, Cathy A.
    Zeng, Yong
    Quan, Hude
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2025, 32 (05) : 876 - 884
  • [28] Using Large Language Models to Detect and Understand Drug Discontinuation Events in Web-Based Forums: Development and Validation Study
    Trevena, William
    Zhong, Xiang
    Alvarado, Michelle
    Semenov, Alexander
    Oktay, Alp
    Devlin, Devin
    Gohil, Aarya Yogesh
    Chittimouju, Sai Harsha
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2025, 27
  • [29] Automated Paper Screening for Clinical Reviews Using Large Language Models: Data Analysis Study
    Guo, Eddie
    Gupta, Mehul
    Deng, Jiawen
    Park, Ye-Jean
    Paget, Michael
    Naugler, Christopher
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
  • [30] A Comparative Study: Can Large Language Models Beat Radiologists on PI-RADSv2.1-Related Questions?
    Eren, Camur
    Turay, Cesur
    Celal, Guenes Yasin
    JOURNAL OF MEDICAL AND BIOLOGICAL ENGINEERING, 2024, 44 (06) : 821 - 830