SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models

被引：0

作者：

Li, Hongxin ^{[1
,2
]}

Su, Jingran ^{[3
,4
]}

Chen, Yuntao ^{[3
]}

Li, Qing ^{[4
]}

Zhang, Zhaoxiang ^{[1
,2
,3
,5
]}

机构：

[1] Univ Chinese Acad Sci UCAS, Sch Artificial Intelligence, Beijing, Peoples R China

[2] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence, Beijing, Peoples R China

[3] Chinese Acad Sci, HKISI, Ctr Artificial Intelligence & Robot, Beijing, Peoples R China

[4] Hong Kong Polytech Univ, Hong Kong, Peoples R China

[5] Shanghai Artificial Intelligence Lab, Shanghai, Peoples R China

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Computer end users have spent billions of hours completing daily tasks like tabular data processing and project timeline scheduling. Most of these tasks are repetitive and error-prone, yet most end users lack the skill to automate these burdensome works. With the advent of large language models (LLMs), directing software with natural language user requests become a reachable goal. In this work, we propose a SheetCopilot agent that takes natural language task and control spreadsheet to fulfill the requirements. We propose a set of atomic actions as an abstraction of spreadsheet software functionalities. We further design a state machine-based task planning framework for LLMs to robustly interact with spreadsheets. We curate a representative dataset containing 221 spreadsheet control tasks and establish a fully automated evaluation pipeline for rigorously benchmarking the ability of LLMs in software control tasks. Our SheetCopilot correctly completes 44.3% of tasks for a single generation, outperforming the strong code generation baseline by a wide margin. Our project page: https://sheetcopilot.github.io/.

引用

页数：33

共 32 条

[1]

Ahn M, 2022, PR MACH LEARN RES, V205, P287

[2]

[Anonymous], 2011, POPL 11 P 38 ANN ACM, DOI DOI 10.1145/1926385.1926423

[3]

Bommasani R., 2022, On the Opportunities and Risks of Foundation Models, DOI DOI 10.48550/ARXIV.2108.07258

[4]

Bran A. M., 2023, arXiv

[5]

Brown TB, 2020, ADV NEUR IN, V33

[6]

Chen M., 2021, Evaluating large language models trained on code

[7] Robust and Accurate Object Detection via Adversarial Learning [J].

Chen, Xiangning ;

Xie, Cihang ;

Tan, Mingxing ;

Zhang, Li ;

Hsieh, Cho-Jui ;

Gong, Boqing .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :16617-16626

[8]

Chiang Cheng-Han, 2023, Can large language models be an alternative to human evaluations?

[9]

Driess D., 2023, arXiv

[10]

Dziri N, 2022, NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, P5271

← 1 2 3 4 →