Controlling overfitting in classification-tree models of software quality

被引:26
|
作者
Khoshgoftaar T.M. [1 ]
Allen E.B. [2 ]
机构
[1] Florida Atlantic University, Boca Raton, FL
[2] Mississippi State University, MS
关键词
Algorithms - Computer simulation - Data structures - Fault tolerant computer systems - Large scale systems - Telecommunication systems;
D O I
10.1023/A:1009803004576
中图分类号
学科分类号
摘要
Predicting which modules are likely to have faults during operations is important to software developers, so that software enhancement efforts can be focused on those modules that need improvement the most. Modeling software quality with classification trees is attractive because they readily model nonmonotonic relationships. In this paper, we apply the TREEDISC algorithm which is a refinement of the CHAID algorithm to build classification-tree models. CHAID-based algorithms differ from other classification-tree algorithms in their reliance on chi-squared tests when building the tree. Classification-tree models are vulnerable to overfitting, where the model reflects the structure of the training data set too closely. Even though a model appears to be accurate on training data, if overfitted, it may be much less accurate when applied to a current data set. To account for the severe consequences of misclassifying fault-prone modules, our measure of overfitting is based on expected costs of misclassification, rather than the total number of misclassifications. We conducted a case study of a very large telecommunications system. A two-way analysis of variance with repetitions found that TREEDISC's significance level was highly related to overfitting, and can be used to control it. Moreover, the minimum number of modules in a leaf also influenced the degree of overfitting.
引用
收藏
页码:59 / 79
页数:20
相关论文
共 50 条
  • [31] Software controlling: Quality-related project controlling
    Bennicke, Marcel
    Hofmann, Alexander
    Lewerentz, Claus
    Wichert, Karl-Heinz
    Informatik-Spektrum, 2008, 31 (06) : 556 - 565
  • [32] An Evaluation of Feature Selection Techniques with Various Software Quality Classification Models
    Gao, Kehan
    Khoshgoftaar, Taghi M.
    Bullard, Lofton A.
    15TH ISSAT INTERNATIONAL CONFERENCE ON RELIABILITY AND QUALITY IN DESIGN, PROCEEDINGS, 2009, : 357 - +
  • [33] Improving tree-based models of software quality with principal components analysis
    Khoshgoftaar, TM
    Allen, EB
    11TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING, PROCEEDINGS, 2000, : 198 - 209
  • [34] Interactions Between Factors Related to the Decision of Sex Offenders to Confess During Police Interrogation: A Classification-Tree Approach
    Beauregard, Eric
    Deslauriers-Varin, Nadine
    St-Yves, Michel
    SEXUAL ABUSE-A JOURNAL OF RESEARCH AND TREATMENT, 2010, 22 (03) : 343 - 367
  • [35] Defect Classification Method for Software Management Quality Control Based on Decision Tree Learning
    Tang Rongfa
    ADVANCED TECHNOLOGY IN TEACHING - PROCEEDINGS OF THE 2009 3RD INTERNATIONAL CONFERENCE ON TEACHING AND COMPUTATIONAL SCIENCE (WTCS 2009), VOL 1: INTELLIGENT UBIQUITIOUS COMPUTING AND EDUCATION, 2012, 116 : 721 - 728
  • [36] Averaging classification tree models
    Shannon, WD
    DIMENSION REDUCTION, COMPUTATIONAL COMPLEXITY AND INFORMATION, 1998, 30 : 77 - 83
  • [37] INFORMATIONAL SOFTWARE FOR CONTROLLING QUALITY OF ELECTROENERGY
    MARKUSHEVICH, NS
    ELECTRICAL TECHNOLOGY, 1974, (04): : 37 - 50
  • [38] Rethinking the Impacts of Overfitting and Feature Quality on Small-scale Video Classification
    Wu, Xuansheng
    Yang, Feichi
    Zhou, Tong
    Lin, Xinyue
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4760 - 4764
  • [39] Improving usefulness of software quality classification models based on Boolean Discriminant Functions
    Khoshgoftaar, TM
    Seliya, N
    13TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING, PROCEEDINGS, 2002, : 221 - 230
  • [40] A Generic Framework for Automated Quality Assurance of Software Models - Implementation of an Abstract Syntax Tree
    Owens, Darryl
    Anderson, Mark
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (01) : 32 - 38