A machine learning approach for hierarchical classification of software requirements

被引:7
作者
Binkhonain, Manal [1 ]
Zhao, Liping [2 ]
机构
[1] King Saud Univ, Coll Comp & Informat Sci, Riyadh, Saudi Arabia
[2] Univ Manchester, Dept Comp Sci, Manchester M13 9PL, England
来源
MACHINE LEARNING WITH APPLICATIONS | 2023年 / 12卷
关键词
Requirements engineering; Requirements classification; Machine learning; Hierarchical classification; Imbalanced classes; High Dimensional Data with Low Sample Size (HDLSS); FEATURE-SELECTION; TEXT CLASSIFICATION;
D O I
10.1016/j.mlwa.2023.100457
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Context: Classification of software requirements into different categories is a critically important task in requirements engineering (RE). Developing machine learning (ML) approaches for requirements classification has attracted great interest in the RE community since the 2000s. Objective: This paper aims to address two related problems that have been challenging real -world applications of ML approaches: the problems of class imbalance and high dimensionality with low sample size data (HDLSS). These problems can greatly degrade the classification performance of ML methods. Methods: The paper proposes HC4RC , a novel ML approach for multiclass classification of requirements. HC4RC solves the aforementioned problems through semantic -role based feature selection, dataset decomposition and hierarchical classification. We experimentally compare the effectiveness of HC4RC with three closely related approaches - two of which are based on a traditional statistical classification model whereas one using an advanced deep learning model. Results: Our experiment shows: (1) The class imbalance and HDLSS problems present a challenge to both traditional and advanced ML approaches. (2) The HC4RC approach is simple to use and can effectively address the class imbalance and HDLSS problems compared to similar approaches. Conclusion: This paper makes an important practical contribution to addressing the class imbalance and HDLSS problems in multiclass classification of software requirements.
引用
收藏
页数:12
相关论文
共 79 条
[71]   The domain theory for requirements engineering [J].
Sutcliffe, A ;
Maiden, N .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1998, 24 (03) :174-196
[72]   Multiclass Imbalance Problems: Analysis and Potential Solutions [J].
Wang, Shuo ;
Yao, Xin .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2012, 42 (04) :1119-1130
[73]   Combating the Small Sample Class Imbalance Problem Using Feature Selection [J].
Wasikowski, Mike ;
Chen, Xue-wen .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2010, 22 (10) :1388-1400
[74]   Labeling Chinese predicates with semantic roles [J].
Xue, Nianwen .
COMPUTATIONAL LINGUISTICS, 2008, 34 (02) :225-255
[75]   Feature selection for high-dimensional imbalanced data [J].
Yin, Liuzhi ;
Ge, Yong ;
Xiao, Keli ;
Wang, Xuehua ;
Quan, Xiaojun .
NEUROCOMPUTING, 2013, 105 :3-11
[76]   Performance Analysis of Binarization Strategies for Multi-class Imbalanced Data Classification [J].
Zak, Michal ;
Wozniak, Michal .
COMPUTATIONAL SCIENCE - ICCS 2020, PT IV, 2020, 12140 :141-155
[77]   Natural Language Processing for Requirements Engineering: A Systematic Mapping Study [J].
Zhao, Liping ;
Alhoshan, Waad ;
Ferrari, Alessio ;
Letsholo, Keletso J. ;
Ajagbe, Muideen A. ;
Chioasca, Erol-Valeriu ;
Batista-Navarro, Riza T. .
ACM COMPUTING SURVEYS, 2022, 54 (03)
[78]   Cost-sensitive hierarchical classification for imbalance classes [J].
Zheng, Weijie ;
Zhao, Hong .
APPLIED INTELLIGENCE, 2020, 50 (08) :2328-2338
[79]  
Zheng Z., 2004, ACM SIGKDD Explorations Newsletter-Special issue on learning from imbalanced datasets, V6, P80, DOI [DOI 10.1145/1007730.1007741, 10.1145/1007730.1007741]