Can ChatGPT Evaluate Plans?

被引:18
作者
Fu, Xinyu [1 ]
Wang, Ruoniu [2 ]
Li, Chaosu [3 ,4 ]
机构
[1] Univ Waikato, Environm planning, Hamilton, New Zealand
[2] Univ Washington, Runstad Dept Real Estate, Coll Built Environm, Seattle, WA USA
[3] Hong Kong Univ Sci & Technol, Urban Governance & Design Thrust, Guangzhou, Peoples R China
[4] Hong Kong Univ Sci & Technol, Div Publ Policy, Hong Kong, Peoples R China
关键词
ChatGPT; large language model; natural language processing; plan evaluation; plan quality; SEA-LEVEL RISE; CLIMATE-CHANGE; MITIGATION PLANS; ADAPTATION; IMPLEMENTATION; PLANNERS; STATE;
D O I
10.1080/01944363.2023.2271893
中图分类号
TU98 [区域规划、城乡规划];
学科分类号
0814 ; 082803 ; 0833 ;
摘要
Problem, research strategy, and findingsLarge language models, such as ChatGPT, have recently risen to prominence in producing human-like conversation and assisting with various tasks, particularly for analyzing high-dimensional textual materials. Because planning researchers and practitioners often need to evaluate planning documents that are long and complex, a first-ever possible question has emerged: Can ChatGPT evaluate plans? In this study we addressed this question by leveraging ChatGPT to evaluate the quality of plans and compare the results with those conducted by human coders. Through the evaluation of 10 climate change plans, we discovered that ChatGPT's evaluation results coincided reasonably well (with an average of 68%) with those from the traditional content analysis approach. We further scrutinized the differences by conducting a more in-depth analysis of the results from ChatGPT and manual evaluation to uncover what might have contributed to the variance in results. Our findings indicate that ChatGPT struggled to comprehend planning-specific jargon, yet it could reduce human errors by capturing details in complex planning documents. Finally, we provide insights into leveraging this cutting-edge technology in future planning research and practice.Takeaway for practiceChatGPT cannot be used to replace humans in plan quality evaluation yet. However, it is an effective tool to complement human coders to minimize human errors by identifying discrepancies and fact-checking machine-generated responses. ChatGPT generally cannot understand planning jargon, so planners wanting to use this tool should use extra caution when planning terminologies are present in their prompts. Creating effective prompts for ChatGPT is an iterative process that requires specific instructions.
引用
收藏
页码:525 / 536
页数:12
相关论文
共 55 条
[1]  
American Planning Association (APA), 2023, UNLEASHING POTENTIAL
[2]   Evaluating the Performance of ChatGPT in Ophthalmology [J].
Antaki, Fares ;
Touma, Samir ;
Milad, Daniel ;
El -Khoury, Jonathan ;
Duval, Renaud .
OPHTHALMOLOGY SCIENCE, 2023, 3 (04)
[3]  
Bell J.B., 2004, HDB PRACTICAL PROGRA, P571
[4]   Planning for Resiliency: Evaluation of State Hazard Mitigation Plans under the Disaster Mitigation Act [J].
Berke, Philip ;
Smith, Gavin ;
Lyles, Ward .
NATURAL HAZARDS REVIEW, 2012, 13 (02) :139-149
[5]   Searching for the Good Plan A Meta-Analysis of Plan Quality Studies [J].
Berke, Philip ;
Godschalk, David .
JOURNAL OF PLANNING LITERATURE, 2009, 23 (03) :227-240
[6]   Who Is Planning for Environmental Justice-and How? [J].
Brinkley, Catherine ;
Wagner, Jenny .
JOURNAL OF THE AMERICAN PLANNING ASSOCIATION, 2024, 90 (01) :63-76
[7]  
Brinkley C, 2024, J PLAN EDUC RES, V44, P632, DOI [10.1177/0739456X21995890, 10.1177/0739456x21995890]
[8]   Does planning work? Testing the implementation of local environmental planning in Florida [J].
Brody, SD ;
Highfield, WE .
JOURNAL OF THE AMERICAN PLANNING ASSOCIATION, 2005, 71 (02) :159-175
[9]   Low-Regrets Incrementalism: Land Use Planning Adaptation to Accelerating Sea Level Rise in Florida's Coastal Communities [J].
Butler, William H. ;
Deyle, Robert E. ;
Mutnansky, Cassidy .
JOURNAL OF PLANNING EDUCATION AND RESEARCH, 2016, 36 (03) :319-332
[10]  
Chalkidis I., 2023, PREPRINT, DOI DOI 10.48550/ARXIV.2304.12202