The impact of feature selection and feature reduction techniques for code smell detection: A comprehensive empirical study

被引:0
作者
Zexian Zhang [1 ]
Lin Zhu [2 ]
Shuang Yin [3 ]
Wenhua Hu [3 ]
Shan Gao [1 ]
Haoxuan Chen [2 ]
Fuyang Li [4 ]
机构
[1] Wuhan University of Technology,School of Computer Science and Artificial Intelligence
[2] Wuhan University of Technology,Hubei Key Laboratory of Transportation Internet of Things
[3] Wuhan Qingchuan University,School of Computer
[4] Hokkaido University,undefined
关键词
Code smell detection; Feature selection; Feature reduction; Empirical study;
D O I
10.1007/s10515-025-00524-6
中图分类号
学科分类号
摘要
Code smell detection using machine/deep learning methods aims to classify code instances as smelly or non-smelly based on extracted features. Accurate detection relies on optimizing feature sets by focusing on relevant features while discarding those that are redundant or irrelevant. However, prior studies on feature selection and reduction techniques for code smell detection have yielded inconsistent results, possibly due to limited exploration of available techniques. To address this gap, we comprehensively analyze 33 feature selection and 6 feature reduction techniques across seven classification models and six code smell datasets. And we apply the Scott-Knott effect size difference test for comparing performance and McNemar’s test for assessing prediction diversity. The results show that (1) Not all feature selection and reduction techniques significantly improve detection performance. (2) Feature extraction techniques generally perform worse than feature selection techniques. (3) Probabilistic significance is recommended as a “generic” feature selection technique due to its higher consistency in identifying smelly instances. (4) High-frequency features selected by the top feature selection techniques vary by dataset, highlighting their specific relevance for identifying the corresponding code smells. Based on these findings, we provide implications for further code smell detection research.
引用
收藏
相关论文
empty
未找到相关数据