Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models

被引:220
作者
Vaithilingam, Priyan [1 ]
Zhang, Tianyi [2 ]
Glassman, Elena L. [1 ]
机构
[1] Harvard Univ, Cambridge, MA 02138 USA
[2] Purdue Univ, W Lafayette, IN 47907 USA
来源
EXTENDED ABSTRACTS OF THE 2022 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2022 | 2022年
关键词
large language model; github copilot;
D O I
10.1145/3491101.3519665
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Recent advances in Large Language Models (LLM) have made automatic code generation possible for real-world programming tasks in general-purpose programming languages such as Python. However, there are few human studies on the usability of these tools and how they fit the programming workflow. In this work, we conducted a within-subjects user study with 24 participants to understand how programmers use and perceive Copilot, a LLM-based code generation tool. We found that, while Copilot did not necessarily improve the task completion time or success rate, most participants preferred to use Copilot in daily programming tasks, since Copilot often provided a useful starting point and saved the effort of searching online. However, participants did face difficulties in understanding, editing, and debugging code snippets generated by Copilot, which significantly hindered their task-solving effectiveness. Finally, we highlighted several promising directions for improving the design of Copilot based on our observations and participants' feedback.
引用
收藏
页数:7
相关论文
共 52 条
  • [1] Alon U, 2020, PR MACH LEARN RES, V119
  • [2] Syntax-Guided Synthesis
    Alur, Rajeev
    Bodik, Rastislav
    Dallal, Eric
    Fisman, Dana
    Garg, Pranav
    Juniwal, Garvit
    Kress-Gazit, Hadas
    Madhusudan, P.
    Martin, Milo M. K.
    Raghothaman, Mukund
    Saha, Shamwaditya
    Seshia, Sanjit A.
    Singh, Rishabh
    Solar-Lezama, Armando
    Torlak, Emina
    Udupa, Abhishek
    [J]. DEPENDABLE SOFTWARE SYSTEMS ENGINEERING, 2015, 40 : 1 - 25
  • [3] Antifakos S., 2005, P 7 C HUM COMP INT M, P9
  • [4] Hayati SA, 2018, Arxiv, DOI arXiv:1808.10025
  • [5] Balog M, 2017, Arxiv, DOI arXiv:1611.01989
  • [6] Black Sid, 2021, GPTNeo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow, DOI DOI 10.5281/ZENODO.5297715IFYOUUSETHISSOFTWARE
  • [7] Rousillon: Scraping Distributed Hierarchical Web Data
    Chasins, Sarah E.
    Mueller, Maria
    Bodik, Rastislav
    [J]. UIST 2018: PROCEEDINGS OF THE 31ST ANNUAL ACM SYMPOSIUM ON USER INTERFACE SOFTWARE AND TECHNOLOGY, 2018, : 963 - 975
  • [8] Chen M., 2021, arXiv
  • [9] Ciniselli M, 2021, Arxiv, DOI arXiv:2103.07115
  • [10] Cypher Allen, 1995, Readings in human-computer interaction, P804