Heterogeneous Defect Prediction Based on Federated Prototype Learning

被引:3
作者
Wang, Aili [1 ]
Yang, Linlin [1 ]
Wu, Haibin [1 ]
Iwahori, Yuji [2 ]
机构
[1] Harbin Univ Sci & Technol, Heilongjiang Prov Key Lab Laser Spect Technol & Ap, Harbin 150080, Peoples R China
[2] Chubu Univ, Dept Comp Sci, Kasugai, Aichi 4878501, Japan
关键词
Heterogeneous defect prediction; federated learning; prototype learning; data islands;
D O I
10.1109/ACCESS.2023.3313001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Software defect prediction is used to identify modules in software projects that may have defects. Heterogeneous Defect Prediction (HDP) establishes a cross project defect prediction model based on different software defect datasets. However, due to the heterogeneity of multi-source data, the model performance is usually not ideal. In addition, the project data holder is unwilling to disclose the data due to privacy regulations and other reasons, resulting in data islands. This paper presents a federal prototype learning based on prototype averaging (FPLPA), which combines federated learning (FL) with prototype learning for heterogeneous defect prediction. Firstly, the client used one-sided selection (OSS) algorithm to remove noise from local training data, and applied Chi-Squares Test algorithm to select the optimal subset of features. Secondly, the client constructed the convolution prototype network (CPN) to generate their own local prototypes. CPN are more robust to heterogeneous data than convolutional neural networks (CNN), while avoiding the deviation effect of class imbalances in software data. The prototype is used as the communication subject between the clients and the server. Because the local prototype is generated in an irreversible way, it can play a role of privacy protection in the communication process. Finally, the local CPN network is updated with the loss of local prototype and global prototype as regularization. We have verified on 10 projects in three public data sets (AEEEM, NASA and Relink), and the experimental results show that FPLPA is superior to other HDP solutions.
引用
收藏
页码:98618 / 98632
页数:15
相关论文
共 58 条
[31]  
Madi A, 2021, 2021 RECONCILING DAT, P1
[32]  
McMahan HB, 2017, PR MACH LEARN RES, V54, P1273
[33]   Data mining static code attributes to learn defect predictors [J].
Menzies, Tim ;
Greenwald, Jeremy ;
Frank, Art .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2007, 33 (01) :2-13
[34]   Heterogeneous Defect Prediction [J].
Nam, Jaechang ;
Fu, Wei ;
Kim, Sunghun ;
Menzies, Tim ;
Tan, Lin .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2018, 44 (09) :874-896
[35]  
Nam J, 2013, PROCEEDINGS OF THE 35TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2013), P382, DOI 10.1109/ICSE.2013.6606584
[36]   Domain Adaptation via Transfer Component Analysis [J].
Pan, Sinno Jialin ;
Tsang, Ivor W. ;
Kwok, James T. ;
Yang, Qiang .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2011, 22 (02) :199-210
[37]  
Pandit MBR, 2019, TENCON IEEE REGION, P284, DOI [10.1109/TENCON.2019.8929661, 10.1109/tencon.2019.8929661]
[38]   DisP plus V: A Unified Framework for Disentangling Prototype and Variation From Single Sample per Person [J].
Pang, Meng ;
Wang, Binghui ;
Ye, Mang ;
Cheung, Yiu-ming ;
Chen, Yiran ;
Wen, Bihan .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (02) :867-881
[39]  
Pengfei Zhu, 2010, Proceedings of the 2010 20th International Conference on Pattern Recognition (ICPR 2010), P2604, DOI 10.1109/ICPR.2010.638
[40]   Deep Feature Learning to Quantitative Prediction of Software Defects [J].
Qiao, Lei ;
Li, Guangjie ;
Yu, Daohua ;
Liu, Hui .
2021 IEEE 45TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2021), 2021, :1401-1402