Y Cross-Project Defect Prediction via Landmark Selection-Based Kernelized Discriminant Subspace Alignment

被引:21
作者
Li, Zhiqiang [1 ]
Niu, Jingwen [2 ]
Jing, Xiao-Yuan [3 ,4 ]
Yu, Wangyang [1 ]
Qi, Chao [1 ]
机构
[1] Shaanxi Normal Univ, Sch Comp Sci, Xian 710119, Peoples R China
[2] Xinxiang Univ, Sch Comp & Informat Engn, Xinxiang 453003, Henan, Peoples R China
[3] Wuhan Univ, Sch Comp Sci, Wuhan 430072, Peoples R China
[4] Guangdong Univ Petrochem Technol, Sch Comp, Maoming 525000, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-project defect prediction (CPDP); discriminant subspace alignment; domain adaptation; kernel projection; landmark selection; source label propagation; ADAPTATION; CLASSIFICATION; MODEL; CODE;
D O I
10.1109/TR.2021.3074660
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-project defect prediction (CPDP) refers to identifying defect-prone software modules in one project (target) using historical data collected from other projects (source), which can help developers find bugs and prioritize their testing efforts. Recently, CPDP has attracted great research interest. However, the source and target data usually exist redundancy and nonlinearity characteristics. Besides, most CPDP methods do not exploit source label information to uncover the underlying knowledge for label propagation. These factors usually lead to unsatisfactory CPDP performance. To address the above limitations, we propose a landmark selection-based kernelized discriminant subspace alignment (LSKDSA) approach for CPDP. LSKDSA not only reduces the discrepancy of the data distributions between the source and target projects, but also characterizes the complex data structures and increases the probability of linear separability of the data. Moreover, LSKDSA encodes label information of the source data into domain adaptation learning process and makes itself with good discriminant ability. Extensive experiments on 13 public projects fromthree benchmark datasets demonstrate that LSKDSA performs better than a range of competing CPDP methods. The improvement is 3.44% - 11.23% in g-measure, 5.75% - 11.76% in AUC, and 9.34% - 33.63% in MCC, respectively.
引用
收藏
页码:996 / 1013
页数:18
相关论文
共 57 条
  • [1] Aljundi R, 2015, PROC CVPR IEEE, P56, DOI 10.1109/CVPR.2015.7298600
  • [2] [Anonymous], 2004, KERNEL METHODS PATTE
  • [3] [Anonymous], 2011, P 19 ACM SIGSOFT S 1
  • [4] Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection
    Belhumeur, PN
    Hespanha, JP
    Kriegman, DJ
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1997, 19 (07) : 711 - 720
  • [5] Training data selection for cross-project defection prediction: which approach is better?
    Bin, Yi
    Zhou, Kai
    Lu, Hongmin
    Zhou, Yuming
    Xu, Baowen
    [J]. 11TH ACM/IEEE INTERNATIONAL SYMPOSIUM ON EMPIRICAL SOFTWARE ENGINEERING AND MEASUREMENT (ESEM 2017), 2017, : 354 - 363
  • [6] An Empirical Study on Heterogeneous Defect Prediction Approaches
    Chen, Haowen
    Jing, Xiao-Yuan
    Li, Zhiqiang
    Wu, Di
    Peng, Yi
    Huang, Zhiguo
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2021, 47 (12) : 2803 - 2822
  • [7] Cruz AEC, 2009, INT SYMP EMP SOFTWAR, P461
  • [8] Evaluating defect prediction approaches: a benchmark and an extensive comparison
    D'Ambros, Marco
    Lanza, Michele
    Robbes, Romain
    [J]. EMPIRICAL SOFTWARE ENGINEERING, 2012, 17 (4-5) : 531 - 577
  • [9] Demsar J, 2006, J MACH LEARN RES, V7, P1
  • [10] Unsupervised Visual Domain Adaptation Using Subspace Alignment
    Fernando, Basura
    Habrard, Amaury
    Sebban, Marc
    Tuytelaars, Tinne
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 2960 - 2967