SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models

被引:0
作者
Li, Hongxin [1 ,2 ]
Su, Jingran [3 ,4 ]
Chen, Yuntao [3 ]
Li, Qing [4 ]
Zhang, Zhaoxiang [1 ,2 ,3 ,5 ]
机构
[1] Univ Chinese Acad Sci UCAS, Sch Artificial Intelligence, Beijing, Peoples R China
[2] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence, Beijing, Peoples R China
[3] Chinese Acad Sci, HKISI, Ctr Artificial Intelligence & Robot, Beijing, Peoples R China
[4] Hong Kong Polytech Univ, Hong Kong, Peoples R China
[5] Shanghai Artificial Intelligence Lab, Shanghai, Peoples R China
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Computer end users have spent billions of hours completing daily tasks like tabular data processing and project timeline scheduling. Most of these tasks are repetitive and error-prone, yet most end users lack the skill to automate these burdensome works. With the advent of large language models (LLMs), directing software with natural language user requests become a reachable goal. In this work, we propose a SheetCopilot agent that takes natural language task and control spreadsheet to fulfill the requirements. We propose a set of atomic actions as an abstraction of spreadsheet software functionalities. We further design a state machine-based task planning framework for LLMs to robustly interact with spreadsheets. We curate a representative dataset containing 221 spreadsheet control tasks and establish a fully automated evaluation pipeline for rigorously benchmarking the ability of LLMs in software control tasks. Our SheetCopilot correctly completes 44.3% of tasks for a single generation, outperforming the strong code generation baseline by a wide margin. Our project page: https://sheetcopilot.github.io/.
引用
收藏
页数:33
相关论文
共 32 条
[1]  
Ahn M, 2022, PR MACH LEARN RES, V205, P287
[2]  
[Anonymous], 2011, POPL 11 P 38 ANN ACM, DOI DOI 10.1145/1926385.1926423
[3]  
Bommasani R., 2022, On the Opportunities and Risks of Foundation Models, DOI DOI 10.48550/ARXIV.2108.07258
[4]  
Bran A. M., 2023, arXiv
[5]  
Brown TB, 2020, ADV NEUR IN, V33
[6]  
Chen M., 2021, Evaluating large language models trained on code
[7]   Robust and Accurate Object Detection via Adversarial Learning [J].
Chen, Xiangning ;
Xie, Cihang ;
Tan, Mingxing ;
Zhang, Li ;
Hsieh, Cho-Jui ;
Gong, Boqing .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :16617-16626
[8]  
Chiang Cheng-Han, 2023, Can large language models be an alternative to human evaluations?
[9]  
Driess D., 2023, arXiv
[10]  
Dziri N, 2022, NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, P5271