Smart Product Backlog: Automatic Classification of User Stories Using Large Language Models (LLM)

被引：0

作者：

Gaona-Cuevas, Mauricio ^{[1
]}

Bucheli-Guerrero, Victor ^{[1
]}

Vera-Rivera, Fredy ^{[2
]}

机构：

[1] Univ Valle, Cali, Valle Del Cauca, Colombia

[2] Univ Francisco Paula Santander, Cucuta, Norte De Santan, Colombia

来源：

REVISTA FACULTAD DE INGENIERIA, UNIVERSIDAD PEDAGOGICA Y TECNOLOGICA DE COLOMBIA | 2024年 / 33卷 / 69期

关键词：

artificial intelligence; large scale language models; smart product backlog; smart user story identifier; Software Requirements Specification; user story classification;

D O I：

10.19503/01211129.v33.n69.2024.18076

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

In agile software development processes, specifically within intelligent applications that leverage artificial intelligence (AI), Smart Product Backlog (SPB) serves as an artifact that includes both AI-implementable functionalities and those that do not use AI. Significant work has been done in the development of Natural Language Processing (NLP) models, and Large Language Models (LLMs) have demonstrated exceptional performance. However, whether LLMs can be used in automatic classification tasks without prior annotation, thereby allowing direct extraction from the Smart Product Backlog (SPB) remains an unanswered question. In this study, we compared the effectiveness of fine-tuning techniques with "prompting" methods to determine the potential of models such as ChatGPT-4o, Gemini Pro 1.5, and ChaGPT-Mini. A dataset was constructed with user stories manually classified by a group of experts, which enabled assembling experiments and creating the respective contingency tables. The classification performance metrics of each LLM were statistically evaluated; accuracy, sensitivity, and F1-Score were used to assess the effectiveness of each model. This comparative approach aimed to highlight the strengths and limitations of each LLM in efficiently and accurately assisting in the construction of the SPB. This comparative analysis demonstrates that ChatGPT-Mini has limitations in balancing precision and sensitivity. Although Gemini Pro 1.5 was superior in accuracy scores and ChatGPT performed well, neither is robust enough to build a fully automated tool for user story classification. Therefore, we identified the need to develop a specialized classifier that enables the construction of an automated tool to recommend viable user stories for AI development, thereby supporting decision-making in agile software projects.

引用

页数：14

共 13 条

[1] Extracting Domain Models from Textual Requirements in the Era of Large Language Models [J].

Arulmohan, Sathurshan ;

Meurs, Marie-Jean ;

Mosser, Sebastien .

2023 ACM/IEEE INTERNATIONAL CONFERENCE ON MODEL DRIVEN ENGINEERING LANGUAGES AND SYSTEMS COMPANION, MODELS-C, 2023, :580-587

[2]

Beck K., 2001, Planning Extreme Programming

[3] User Story Classification with Machine Learning and LLMs [J].

Chuor, Porchourng ;

Ittoo, Ashwin ;

Heng, Samedi .

KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT I, KSEM 2024, 2024, 14884 :161-175

[4]

Dalpiaz Fabiano, 2018, Mendeley Data, V1

[5]

Dos Santos C. A., 2024, INT C ART INT COMP D, DOI [10.1109/ACDSA59508.2024.10467677, DOI 10.1109/ACDSA59508.2024.10467677]

[6]

Hong JY, 2024, Arxiv, DOI arXiv:2403.15447

[7] The application of AI techniques in requirements classification: a systematic mapping [J].

Kaur, Kamaljit ;

Kaur, Parminder .

ARTIFICIAL INTELLIGENCE REVIEW, 2024, 57 (03)

[8]

Kumar Bhawnesh, 2023, 2023 9th International Conference on Signal Processing and Communication (ICSC), P264, DOI 10.1109/ICSC60394.2023.10441284

[9]

Lu Ximing, 2022, P C EMP METH NAT LAN, P8938

[10]

Rahman T, 2024, Arxiv, DOI arXiv:2404.01558

← 1 2 →