Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models

被引：220

作者：

Vaithilingam, Priyan ^{[1
]}

Zhang, Tianyi ^{[2
]}

Glassman, Elena L. ^{[1
]}

机构：

[1] Harvard Univ, Cambridge, MA 02138 USA

[2] Purdue Univ, W Lafayette, IN 47907 USA

来源：

EXTENDED ABSTRACTS OF THE 2022 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2022 | 2022年

关键词：

large language model; github copilot;

D O I：

10.1145/3491101.3519665

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recent advances in Large Language Models (LLM) have made automatic code generation possible for real-world programming tasks in general-purpose programming languages such as Python. However, there are few human studies on the usability of these tools and how they fit the programming workflow. In this work, we conducted a within-subjects user study with 24 participants to understand how programmers use and perceive Copilot, a LLM-based code generation tool. We found that, while Copilot did not necessarily improve the task completion time or success rate, most participants preferred to use Copilot in daily programming tasks, since Copilot often provided a useful starting point and saved the effort of searching online. However, participants did face difficulties in understanding, editing, and debugging code snippets generated by Copilot, which significantly hindered their task-solving effectiveness. Finally, we highlighted several promising directions for improving the design of Copilot based on our observations and participants' feedback.

引用

页数：7

共 52 条

[1] Alon U, 2020, PR MACH LEARN RES, V119
[2] Syntax-Guided Synthesis
Alur, Rajeev
Bodik, Rastislav
Dallal, Eric
Fisman, Dana
Garg, Pranav
Juniwal, Garvit
Kress-Gazit, Hadas
Madhusudan, P.
Martin, Milo M. K.
Raghothaman, Mukund
Saha, Shamwaditya
Seshia, Sanjit A.
Singh, Rishabh
Solar-Lezama, Armando
Torlak, Emina
Udupa, Abhishek
[J]. DEPENDABLE SOFTWARE SYSTEMS ENGINEERING, 2015, 40 : 1 - 25
[3] Antifakos S., 2005, P 7 C HUM COMP INT M, P9
[4] Hayati SA, 2018, Arxiv, DOI arXiv:1808.10025
[5] Balog M, 2017, Arxiv, DOI arXiv:1611.01989
[6] Black Sid, 2021, GPTNeo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow, DOI DOI 10.5281/ZENODO.5297715IFYOUUSETHISSOFTWARE
[7] Rousillon: Scraping Distributed Hierarchical Web Data
Chasins, Sarah E.
Mueller, Maria
Bodik, Rastislav
[J]. UIST 2018: PROCEEDINGS OF THE 31ST ANNUAL ACM SYMPOSIUM ON USER INTERFACE SOFTWARE AND TECHNOLOGY, 2018, : 963 - 975
[8] Chen M., 2021, arXiv
[9] Ciniselli M, 2021, Arxiv, DOI arXiv:2103.07115
[10] Cypher Allen, 1995, Readings in human-computer interaction, P804

← 1 2 3 4 5 6 →