AutoSpearman: Automatically Mitigating Correlated Software Metrics for Interpreting Defect Models

被引:50
作者
Jiarpakdee, Jirayus [1 ]
Tantithamthavorn, Chakkrit [1 ]
Treude, Christoph [1 ]
机构
[1] Univ Adelaide, Sch Comp Sci, Adelaide, SA, Australia
来源
PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME) | 2018年
基金
澳大利亚研究理事会;
关键词
Software Analytics; Feature Selection; Defect Prediction; Model Interpretation; Correlated Metrics; PREDICTION; MODULES;
D O I
10.1109/ICSME.2018.00018
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The interpretation of defect models heavily relies on software metrics that are used to construct them. However, such software metrics are often correlated in defect models. Prior work often uses feature selection techniques to remove correlated metrics in order to improve the performance of defect models. Yet, the interpretation of defect models may be misleading if feature selection techniques produce subsets of inconsistent and correlated metrics. In this paper, we investigate the consistency and correlation of the subsets of metrics that are produced by nine commonly-used feature selection techniques. Through a case study of 13 publicly-available defect datasets, we find that feature selection techniques produce inconsistent subsets of metrics and do not mitigate correlated metrics, suggesting that feature selection techniques should not be used and correlation analyses must be applied when the goal is model interpretation. Since correlation analyses often involve manual selection of metrics by a domain expert, we introduce AutoSpearman, an automated metric selection approach based on correlation analyses. Our evaluation indicates that AutoSpearman yields the highest consistency of subsets of metrics among training samples and mitigates correlated metrics, while impacting model performance by 1-2% pts. Thus, to automatically mitigate correlated metrics when interpreting defect models, we recommend future studies use AutoSpearman in lieu of commonly-used feature selection techniques.
引用
收藏
页码:92 / 103
页数:12
相关论文
共 85 条
[11]  
D'Ambros Marco, 2010, Proceedings of the 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), P31, DOI 10.1109/MSR.2010.5463279
[12]   Evaluating defect prediction approaches: a benchmark and an extensive comparison [J].
D'Ambros, Marco ;
Lanza, Michele ;
Robbes, Romain .
EMPIRICAL SOFTWARE ENGINEERING, 2012, 17 (4-5) :531-577
[13]  
Dash M, 2000, LECT NOTES ARTIF INT, V1805, P98
[14]   Studying the Relationship between Exception Handling Practices and Post-release Defects [J].
de Padua, Guilherme B. ;
Shang, Weiyi .
2018 IEEE/ACM 15TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR), 2018, :564-575
[15]  
Denaro G, 2002, ICSE 2002: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, P241, DOI 10.1109/ICSE.2002.1007972
[16]   Belief & Evidence in Empirical Software Engineering [J].
Devanbu, Prem ;
Zimmermann, Thomas ;
Bird, Christian .
2016 IEEE/ACM 38TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2016, :108-119
[17]  
Efron B., 1993, INTRO BOOTSTRAP, DOI 10.1007/978-1-4899-4541-9
[18]   Predicting defect-prone software modules using support vector machines [J].
Elish, Karim O. ;
Elish, Mahmoud O. .
JOURNAL OF SYSTEMS AND SOFTWARE, 2008, 81 (05) :649-660
[19]   GENERALIZED COLLINEARITY DIAGNOSTICS [J].
FOX, J ;
MONETTE, G .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1992, 87 (417) :178-183
[20]  
Fox J., 2015, Applied regression analysis and generalized linear models