Detection of Software Security Weaknesses Using Cross-Language Source Code Representation (CLaSCoRe)

被引：0

作者：

Zaharia, Sergiu ^{[1
]}

Rebedea, Traian ^{[1
]}

Trausan-Matu, Stefan ^{[1
,2
]}

机构：

[1] Univ Politehn Bucuresti, Fac Automat Control & Comp, Bucharest 060042, Romania

[2] Romanian Acad, Inst Artificial Intelligence Mihai Draganescu, Bucharest 050711, Romania

来源：

APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 13期

关键词：

software security engineering; machine learning; code embeddings; common weakness enumeration; zero-shot classification;

D O I：

10.3390/app13137871

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

The research presented in the paper aims at increasing the capacity to identify security weaknesses in programming languages that are less supported by specialized security analysis tools, based on the knowledge gathered from securing the popular ones, for which security experts, scanners, and labeled datasets are, in general, available. This goal is vital in reducing the overall exposure of software applications. We propose a solution to expand the capabilities of security gaps detection to downstream languages, influenced by their more popular "ancestors" from the programming languages' evolutionary tree, using language keyword tokenization and clustering based on word embedding techniques. We show that after training a machine learning algorithm on C, C++, and Java applications developed by a community of programmers with similar behavior of writing code, we can detect, with acceptable accuracy, similar vulnerabilities in C# source code written by the same community. To achieve this, we propose a core cross-language representation of source code, optimized for security weaknesses classifiers, named CLaSCoRe. Using this method, we can achieve zero-shot vulnerability detection-in our case, without using any training data with C# source code.

引用

页数：17

共 27 条

[21] Interpretation of Learning-Based Automatic Source Code Vulnerability Detection Model Using LIME
Tang, Gaigai
Zhang, Long
Yang, Feng
Meng, Lianxiao
Cao, Weipeng
Qiu, Meikang
Ren, Shuangyin
Yang, Lin
Wang, Huiqiang
KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT III, 2021, 12817 : 275 - 286
[22] Predicting Code Hotspots in Open-Source Software from Object-Oriented Metrics Using Machine Learning
Hilton, Rod
Gethner, Ellen
INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2018, 28 (03) : 311 - 331
[23] IDS-ML: An open source code for Intrusion Detection System development using Machine Learning
Yang, Li
Shami, Abdallah
SOFTWARE IMPACTS, 2022, 14
[24] Using Software Metrics for Predicting Vulnerable Code-Components: A Study on Java']Java and Python']Python Open Source Projects
Chong, Tai-Yin
Anu, Vaibhav
Sultana, Kazi Zakia
2019 22ND IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (IEEE CSE 2019) AND 17TH IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (IEEE EUC 2019), 2019, : 98 - 103
[25] Project Achilles: A Prototype Tool for Static Method-Level Vulnerability Detection of Java']Java Source Code Using a Recurrent Neural Network
Saccente, Nicholas
Dehlinger, Josh
Deng, Lin
Chakraborty, Suranjan
Xiong, Yin
2019 34TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING WORKSHOPS (ASEW 2019), 2019, : 114 - 121
[26] Malicious Software Detection based on URL-API Intensity Feature Selection Using Deep Spectral Neural Classification for Improving Host Security
Lavanya, B.
Shanthi, C.
INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2023, 22 (02)
[27] Detection of Suicidal Ideation in Clinical Interviews for Depression Using Natural Language Processing and Machine Learning: Cross-Sectional Study
Li, Tim M. H.
Chen, Jie
Law, Framenia O. C.
Li, Chun-Tung
Chan, Ngan Yin
Chan, Joey W. Y.
Chau, Steven W. H.
Liu, Yaping
Li, Shirley Xin
Zhang, Jihui
Leung, Kwong-Sak
Wing, Yun-Kwok
JMIR MEDICAL INFORMATICS, 2023, 11

← 1 2 3 →