Detection of Software Security Weaknesses Using Cross-Language Source Code Representation (CLaSCoRe)

被引：0

作者：

Zaharia, Sergiu ^{[1
]}

Rebedea, Traian ^{[1
]}

Trausan-Matu, Stefan ^{[1
,2
]}

机构：

[1] Univ Politehn Bucuresti, Fac Automat Control & Comp, Bucharest 060042, Romania

[2] Romanian Acad, Inst Artificial Intelligence Mihai Draganescu, Bucharest 050711, Romania

来源：

APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 13期

关键词：

software security engineering; machine learning; code embeddings; common weakness enumeration; zero-shot classification;

D O I：

10.3390/app13137871

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

The research presented in the paper aims at increasing the capacity to identify security weaknesses in programming languages that are less supported by specialized security analysis tools, based on the knowledge gathered from securing the popular ones, for which security experts, scanners, and labeled datasets are, in general, available. This goal is vital in reducing the overall exposure of software applications. We propose a solution to expand the capabilities of security gaps detection to downstream languages, influenced by their more popular "ancestors" from the programming languages' evolutionary tree, using language keyword tokenization and clustering based on word embedding techniques. We show that after training a machine learning algorithm on C, C++, and Java applications developed by a community of programmers with similar behavior of writing code, we can detect, with acceptable accuracy, similar vulnerabilities in C# source code written by the same community. To achieve this, we propose a core cross-language representation of source code, optimized for security weaknesses classifiers, named CLaSCoRe. Using this method, we can achieve zero-shot vulnerability detection-in our case, without using any training data with C# source code.

引用

页数：17

共 27 条

[1] Dynamic stacking ensemble for cross-language code smell detection
Aljamaan, Hamoud
PEERJ COMPUTER SCIENCE, 2024, 10
[2] Automated Vulnerability Detection in Source Code Using Deep Representation Learning
Russell, Rebecca L.
Kim, Louis
Hamilton, Lei H.
Lazovich, Tomo
Harer, Jacob A.
Ozdemir, Onur
Ellingwood, Paul M.
McConley, Marc W.
2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, : 757 - 762
[3] A Study on Cross-Language Text Summarization Using Supervised Methods
Yu, Lei
Ren, Fuji
IEEE NLP-KE 2009: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2009, : 586 - 592
[4] Mapping Source Code to Software Architecture by Leveraging Large Language Models
Johansson, Nils
Caporuscio, Mauro
Olsson, Tobias
SOFTWARE ARCHITECTURE, ECSA 2024 TRACKS AND WORKSHOPS, 2024, 14937 : 133 - 149
[5] Predicting Security Vulnerabilities using Source Code Metrics
Ganesh, Sundarakrishnan
Ohlsson, Tobias
Palma, Francis
PROCEEDINGS OF THE 2021 SWEDISH WORKSHOP ON DATA SCIENCE (SWEDS), 2021,
[6] Using KCCA for Japanese–English cross-language information retrieval and document classification
Yaoyong Li
John Shawe-Taylor
Journal of Intelligent Information Systems, 2006, 27 : 117 - 133
[7] The Landscape of Source Code Representation Learning in AI-Driven Software Engineering Tasks
Chimalakonda, Sridhar
Das, Debeshee
Mathai, Alex
Tamilselvam, Srikanth
Kumar, Atul
2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: COMPANION PROCEEDINGS, ICSE-COMPANION, 2023, : 342 - 343
[8] Making More with Less: Improving Software Testing Outcomes Using a Cross-Project and Cross-Language ML Classifier Based on Cost-Sensitive Training
Nascimento, Alexandre M.
Shimanuki, Gabriel Kenji G.
Dias, Luiz Alberto V.
APPLIED SCIENCES-BASEL, 2024, 14 (11):
[9] Authorship Attribution of Source Code: A Language-Agnostic Approach and Applicability in Software Engineering
Bogomolov, Egor
Kovalenko, Vladimir
Rebryk, Yurii
Bacchelli, Alberto
Bryksin, Timofey
PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '21), 2021, : 932 - 944
[10] Using KCCA for Japanese-English cross-language information retrieval and document classification
Li, Yaoyong
Shawe-Taylor, John
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2006, 27 (02) : 117 - 133

← 1 2 3 →