Detection of Software Security Weaknesses Using Cross-Language Source Code Representation (CLaSCoRe)

被引:0
|
作者
Zaharia, Sergiu [1 ]
Rebedea, Traian [1 ]
Trausan-Matu, Stefan [1 ,2 ]
机构
[1] Univ Politehn Bucuresti, Fac Automat Control & Comp, Bucharest 060042, Romania
[2] Romanian Acad, Inst Artificial Intelligence Mihai Draganescu, Bucharest 050711, Romania
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 13期
关键词
software security engineering; machine learning; code embeddings; common weakness enumeration; zero-shot classification;
D O I
10.3390/app13137871
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The research presented in the paper aims at increasing the capacity to identify security weaknesses in programming languages that are less supported by specialized security analysis tools, based on the knowledge gathered from securing the popular ones, for which security experts, scanners, and labeled datasets are, in general, available. This goal is vital in reducing the overall exposure of software applications. We propose a solution to expand the capabilities of security gaps detection to downstream languages, influenced by their more popular "ancestors" from the programming languages' evolutionary tree, using language keyword tokenization and clustering based on word embedding techniques. We show that after training a machine learning algorithm on C, C++, and Java applications developed by a community of programmers with similar behavior of writing code, we can detect, with acceptable accuracy, similar vulnerabilities in C# source code written by the same community. To achieve this, we propose a core cross-language representation of source code, optimized for security weaknesses classifiers, named CLaSCoRe. Using this method, we can achieve zero-shot vulnerability detection-in our case, without using any training data with C# source code.
引用
收藏
页数:17
相关论文
共 27 条
  • [21] Interpretation of Learning-Based Automatic Source Code Vulnerability Detection Model Using LIME
    Tang, Gaigai
    Zhang, Long
    Yang, Feng
    Meng, Lianxiao
    Cao, Weipeng
    Qiu, Meikang
    Ren, Shuangyin
    Yang, Lin
    Wang, Huiqiang
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT III, 2021, 12817 : 275 - 286
  • [22] Predicting Code Hotspots in Open-Source Software from Object-Oriented Metrics Using Machine Learning
    Hilton, Rod
    Gethner, Ellen
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2018, 28 (03) : 311 - 331
  • [23] IDS-ML: An open source code for Intrusion Detection System development using Machine Learning
    Yang, Li
    Shami, Abdallah
    SOFTWARE IMPACTS, 2022, 14
  • [24] Using Software Metrics for Predicting Vulnerable Code-Components: A Study on Java']Java and Python']Python Open Source Projects
    Chong, Tai-Yin
    Anu, Vaibhav
    Sultana, Kazi Zakia
    2019 22ND IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (IEEE CSE 2019) AND 17TH IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (IEEE EUC 2019), 2019, : 98 - 103
  • [25] Project Achilles: A Prototype Tool for Static Method-Level Vulnerability Detection of Java']Java Source Code Using a Recurrent Neural Network
    Saccente, Nicholas
    Dehlinger, Josh
    Deng, Lin
    Chakraborty, Suranjan
    Xiong, Yin
    2019 34TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING WORKSHOPS (ASEW 2019), 2019, : 114 - 121
  • [26] Malicious Software Detection based on URL-API Intensity Feature Selection Using Deep Spectral Neural Classification for Improving Host Security
    Lavanya, B.
    Shanthi, C.
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2023, 22 (02)
  • [27] Detection of Suicidal Ideation in Clinical Interviews for Depression Using Natural Language Processing and Machine Learning: Cross-Sectional Study
    Li, Tim M. H.
    Chen, Jie
    Law, Framenia O. C.
    Li, Chun-Tung
    Chan, Ngan Yin
    Chan, Joey W. Y.
    Chau, Steven W. H.
    Liu, Yaping
    Li, Shirley Xin
    Zhang, Jihui
    Leung, Kwong-Sak
    Wing, Yun-Kwok
    JMIR MEDICAL INFORMATICS, 2023, 11