VCCFinder: Finding Potential Vulnerabilities in Open-Source Projects to Assist Code Audits

被引:156
作者
Perl, Henning [1 ]
Dechand, Sergej [2 ]
Smith, Matthew [1 ,2 ]
Arp, Daniel [3 ]
Yamaguchi, Fabian [3 ]
Rieck, Konrad [3 ]
Fahl, Sascha [4 ]
Acar, Yasemin [4 ]
机构
[1] Fraunhofer FKIE, Wachtberg, Germany
[2] Univ Bonn, Bonn, Germany
[3] Univ Gottingen, Gottingen, Germany
[4] Saarland Univ, Saarbrucken, Germany
来源
CCS'15: PROCEEDINGS OF THE 22ND ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY | 2015年
关键词
Vulnerabilities; Static Analysis; Machine Learning;
D O I
10.1145/2810103.2813604
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Despite the security community's best effort, the number of serious vulnerabilities discovered in software is increasing rapidly. In theory, security audits should find and remove the vulnerabilities before the code ever gets deployed. However, due to the enormous amount of code being produced, as well as a the lack of manpower and expertise, not all code is sufficiently audited. Thus, many vulnerabilities slip into production systems. A best-practice approach is to use a code metric analysis tool, such as Flawfinder, to flag potentially dangerous code so that it can receive special attention. However, because these tools have a very high false-positive rate, the manual effort needed to find vulnerabilities remains overwhelming. In this paper, we present a new method of finding potentially dangerous code in code repositories with a significantly lower false-positive rate than comparable systems. We combine code-metric analysis with metadata gathered from code repositories to help code review teams prioritize their work. The paper makes three contributions. First, we conducted the first large-scale mapping of CVEs to GitHub commits in order to create a vulnerable commit database. Second, based on this database, we trained a SVM classifier to flag suspicious commits Compared to Flawfinder, our approach reduces the amount of false alarms by over 99 % at the same level of recall. Finally, we present a thorough quantitative and qualitative analysis of our approach and discuss lessons learned from the results. We will share the database as a benchmark for future research and will also provide our analysis tool as a web service.
引用
收藏
页码:426 / 437
页数:12
相关论文
共 29 条
  • [1] [Anonymous], 2010, P USENIX SEC
  • [2] [Anonymous], 2014, P 6 INT WORKSHOP SOC
  • [3] [Anonymous], 2012, Software Engineering Notes
  • [4] [Anonymous], 2011, USENIX SEC S
  • [5] Cadar C., 2008, Proceedings of the 8th USENIX conference on Operating systems design and implementation, OSDI'08, (USA), P209
  • [6] Discovering neglected conditions in software by mining dependence graphs
    Chang, Ray-Yaung
    Podgurski, Andy
    Yang, Jiong
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2008, 34 (05) : 579 - 596
  • [7] Dahse J, 2014, PROCEEDINGS OF THE 23RD USENIX SECURITY SYMPOSIUM, P989
  • [8] GO TO STATEMENT CONSIDERED HARMFUL
    DIJKSTRA, EW
    [J]. COMMUNICATIONS OF THE ACM, 1968, 11 (03) : 147 - &
  • [9] Fan RE, 2008, J MACH LEARN RES, V9, P1871
  • [10] Halstead M.H., 1977, Elements of Software Science (Operating and Programming Systems Series