Data mining in software metrics databases

被引:37
作者
Dick, S
Meeks, A
Last, M
Bunke, H
Kandel, A
机构
[1] Univ S Florida, Dept Comp Sci & Engn, Tampa, FL 33620 USA
[2] Univ Alberta, Dept Elect & Comp Engn, Edmonton, AB T6G 2V4, Canada
[3] Ben Gurion Univ Negev, Dept Informat Syst Engn, IL-84105 Beer Sheva, Israel
[4] Univ Bern, Dept Comp Sci, CH-3012 Bern, Switzerland
[5] Tel Aviv Univ, Coll Engn, IL-69978 Tel Aviv, Israel
基金
加拿大自然科学与工程研究理事会;
关键词
software reliability; software testing; artificial intelligence; machine learning; data mining; fuzzy clustering;
D O I
10.1016/j.fss.2003.10.006
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We investigate the use of data mining for the analysis of software metric databases, and some of the issues in this application domain. Software metrics are collected at various phases of the software development process, in order to monitor and control the quality of a software product. However, software quality control is complicated by the complex relationship between these metrics and the attributes of a software development process. Data mining has been proposed as a potential technology for supporting and enhancing our understanding of software metrics and their relationship to software quality. In this paper, we use fuzzy clustering to imestigate three datasets of software metrics, along with the larger issue of whether supervised or unsupervised learning is more appropriate for software engineering problems. While our findings generally confirm the known linear relationship between metrics and change rates, some interesting behaviors are noted. In addition, our results partly contradict earlier studies that only used correlation analysis to investigate these datasets. These results illustrate how intelligent technologies can augment traditional statistical inference in software quality control. (C) 2003 Elsevier B.V. All rights reserved.
引用
收藏
页码:81 / 110
页数:30
相关论文
共 43 条
[21]  
Hoppner F., 1999, FUZZY CLUSTER ANAL M
[22]   AN EXPERIMENTAL-STUDY OF SOFTWARE METRICS FOR REAL-TIME SOFTWARE [J].
JENSEN, HA ;
VAIRAVAN, K .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1985, 11 (02) :231-234
[23]  
KARUNANITHI N, 1996, HDB SOFTWARE RELIABI
[24]   A FUZZY K-NEAREST NEIGHBOR ALGORITHM [J].
KELLER, JM ;
GRAY, MR ;
GIVENS, JA .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1985, 15 (04) :580-585
[25]   Classification-tree models of software-quality over multiple releases [J].
Khoshgoftaar, TM ;
Allen, EB ;
Jones, WD ;
Hudepohl, JP .
IEEE TRANSACTIONS ON RELIABILITY, 2000, 49 (01) :4-11
[26]   Data mining for predictors of software quality [J].
Khoshgoftaar, TM ;
Allen, EB ;
Jones, WD ;
Hudepohl, JP .
INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 1999, 9 (05) :547-563
[27]   Using neural networks to predict software faults during testing [J].
Khoshgoftaar, TM ;
Szabo, RM .
IEEE TRANSACTIONS ON RELIABILITY, 1996, 45 (03) :456-462
[28]   AN EXPERIMENTAL INVESTIGATION OF SOFTWARE METRICS AND THEIR RELATIONSHIP TO SOFTWARE-DEVELOPMENT EFFORT [J].
LIND, RK ;
VAIRAVAN, K .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1989, 15 (05) :649-653
[29]  
Lyu Michael R., 1996, HDB SOFTWARE RELIABI
[30]  
McCabe T. J., 1976, IEEE Transactions on Software Engineering, VSE-2, P308, DOI 10.1109/TSE.1976.233837