Low-rank representation for semi-supervised software defect prediction

被引:8
作者
Zhang, Zhi-Wu [1 ]
Jing, Xiao-Yuan [2 ,3 ]
Wu, Fei [3 ]
机构
[1] Nanjing Univ Posts & Telecommun, Sch Comp, Nanjing, Jiangsu, Peoples R China
[2] Wuhan Univ, Sch Comp, State Key Lab Software Engn, Wuhan, Hubei, Peoples R China
[3] Nanjing Univ Posts & Telecommun, Sch Automat, Nanjing, Jiangsu, Peoples R China
关键词
software engineering; learning (artificial intelligence); pattern clustering; graph theory; program diagnostics; semisupervised software defect prediction; historical defect data; software repositories; automatic defect collection; defect reports; defect-prone modules; semisupervised defect prediction approach; insufficient labelled data; noisy data; unlabelled data; insufficient labelled samples; noisy defect data; low-rank representation; LRRSSDP; SPARSE GRAPH; QUALITY;
D O I
10.1049/iet-sen.2017.0198
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software defect prediction based on machine learning is an active research topic in the field of software engineering. The historical defect data in software repositories may contain noises because automatic defect collection is based on modified logs and defect reports. When the previous defect labels of modules are limited, predicting the defect-prone modules becomes a challenging problem. In this study, the authors propose a graph-based semi-supervised defect prediction approach to solve the problems of insufficient labelled data and noisy data. Graph-based semi-supervised learning methods used the labelled and unlabelled data simultaneously and consider them as the nodes of the graph at the training phase. Therefore, they solve the problem of insufficient labelled samples. To improve the stability of noisy defect data, a powerful clustering method, low-rank representation (LRR), and neighbourhood distance are used to construct the relationship graph of samples. Therefore, they propose a new semi-supervised defect prediction approach, named low-rank representation-based semi-supervised software defect prediction (LRRSSDP). The widely used datasets from NASA projects and noisy datasets are employed as test data to evaluate the performance. Experimental results show that (i) LRRSSDP outperforms several representative state-of-the-art semi-supervised defect prediction methods; and (ii) LRRSSDP can maintain robustness in noisy environments.
引用
收藏
页码:527 / 535
页数:9
相关论文
共 41 条
[1]   An empirical study based on semi-supervised hybrid self-organizing map for software fault prediction [J].
Abaei, Golnoush ;
Selamat, Ali ;
Fujita, Hamido .
KNOWLEDGE-BASED SYSTEMS, 2015, 74 :28-39
[2]  
[Anonymous], 2010, P FAST SOFTW ENCR WO
[3]  
[Anonymous], 2011, ARXIV11071561
[4]  
[Anonymous], 2013, P 23 INT JOINT C ART
[5]   Fair and Balanced? Bias in Bug-Fix Datasets [J].
Bird, Christian ;
Bachmann, Adrian ;
Aune, Eirik ;
Duffy, John ;
Bernstein, Abraham ;
Filkov, Vladimir ;
Devanbu, Premkumar .
7TH JOINT MEETING OF THE EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND THE ACM SIGSOFT SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2009, :121-130
[6]   Robust Principal Component Analysis? [J].
Candes, Emmanuel J. ;
Li, Xiaodong ;
Ma, Yi ;
Wright, John .
JOURNAL OF THE ACM, 2011, 58 (03)
[7]   A Comparison of Semi-Supervised Classification Approaches for Software Defect Prediction [J].
Catal, Cagatay .
JOURNAL OF INTELLIGENT SYSTEMS, 2014, 23 (01) :75-82
[8]   Unlabelled extra data do not always mean extra performance for semi-supervised fault prediction [J].
Catal, Cagatay ;
Diri, Banu .
EXPERT SYSTEMS, 2009, 26 (05) :458-471
[9]   A systematic review of software fault prediction studies [J].
Catal, Cagatay ;
Diri, Banu .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (04) :7346-7354
[10]   Graph-based semisupervised learning [J].
Culp, Mark ;
Michailidis, George .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2008, 30 (01) :174-179