DeepCVA: Automated Commit-level Vulnerability Assessment with Deep Multi-task Learning

被引:46
作者
Triet Huynh Minh Le [1 ]
Hin, David [1 ,2 ]
Croft, Roland [1 ,2 ]
Babar, M. Ali [1 ,2 ]
机构
[1] Univ Adelaide, CREST Ctr Res Engn Software Technol, Adelaide, SA, Australia
[2] Cyber Secur Cooperat Res Ctr, Adelaide, SA, Australia
来源
2021 36TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING ASE 2021 | 2021年
关键词
Software vulnerability; Vulnerability assessment; Deep learning; Multi-task learning; Mining software repositories; Software security;
D O I
10.1109/ASE51524.2021.9678622
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
It is increasingly suggested to identify Software Vulnerabilities (SVs) in code commits to give early warnings about potential security risks. However, there is a lack of effort to assess vulnerability-contributing commits right after they are detected to provide timely information about the exploitability, impact and severity of SVs. Such information is important to plan and prioritize the mitigation for the identified SVs. We propose a novel Deep multi-task learning model, DeepCVA, to automate seven Commit-level Vulnerability Assessment tasks simultaneously based on Common Vulnerability Scoring System (CVSS) metrics. We conduct large-scale experiments on 1,229 vulnerability-contributing commits containing 542 different SVs in 246 real-world software projects to evaluate the effectiveness and efficiency of our model. We show that DeepCVA is the best-performing model with 38% to 59.8% higher Matthews Correlation Coefficient than many supervised and unsupervised baseline models. DeepCVA also requires 6.3 times less training and validation time than seven cumulative assessment models, leading to significantly less model maintenance cost as well. Overall, DeepCVA presents the first effective and efficient solution to automatically assess SVs early in software systems.
引用
收藏
页码:717 / 729
页数:13
相关论文
共 90 条
[11]   Do Bugs Foreshadow Vulnerabilities? A Study of the Chromium Project [J].
Camilo, Felivel ;
Meneely, Andrew ;
Nagappan, Meiyappan .
12TH WORKING CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2015), 2015, :269-279
[12]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[13]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[14]   Large-Scale Empirical Studies on Effort-Aware Security Vulnerability Prediction Methods [J].
Chen, Xiang ;
Zhao, Yingquan ;
Cui, Zhanqi ;
Meng, Guozhu ;
Liu, Yang ;
Wang, Zan .
IEEE TRANSACTIONS ON RELIABILITY, 2020, 69 (01) :70-87
[15]   MultiNet: Multi-Modal Multi-Task Learning for Autonomous Driving [J].
Chowdhuri, Sauhaarda ;
Pankaj, Tushar ;
Zipser, Karl .
2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, :1496-1504
[16]  
Cochran WilliamGemmell., 1999, Sampling techniques, V3d
[17]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[18]  
Elbaz Clement, 2020, P 15 INT C AV REL SE, P1
[19]  
Falessi D, 2020, EMPIR SOFTW ENG, V25, P4805, DOI 10.1007/s10664-020-09868-x
[20]  
Fan Yuanrui, 2019, IEEE Trans. on Software Eng.