Detection of Software Security Weaknesses Using Cross-Language Source Code Representation (CLaSCoRe)

被引:0
|
作者
Zaharia, Sergiu [1 ]
Rebedea, Traian [1 ]
Trausan-Matu, Stefan [1 ,2 ]
机构
[1] Univ Politehn Bucuresti, Fac Automat Control & Comp, Bucharest 060042, Romania
[2] Romanian Acad, Inst Artificial Intelligence Mihai Draganescu, Bucharest 050711, Romania
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 13期
关键词
software security engineering; machine learning; code embeddings; common weakness enumeration; zero-shot classification;
D O I
10.3390/app13137871
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The research presented in the paper aims at increasing the capacity to identify security weaknesses in programming languages that are less supported by specialized security analysis tools, based on the knowledge gathered from securing the popular ones, for which security experts, scanners, and labeled datasets are, in general, available. This goal is vital in reducing the overall exposure of software applications. We propose a solution to expand the capabilities of security gaps detection to downstream languages, influenced by their more popular "ancestors" from the programming languages' evolutionary tree, using language keyword tokenization and clustering based on word embedding techniques. We show that after training a machine learning algorithm on C, C++, and Java applications developed by a community of programmers with similar behavior of writing code, we can detect, with acceptable accuracy, similar vulnerabilities in C# source code written by the same community. To achieve this, we propose a core cross-language representation of source code, optimized for security weaknesses classifiers, named CLaSCoRe. Using this method, we can achieve zero-shot vulnerability detection-in our case, without using any training data with C# source code.
引用
收藏
页数:17
相关论文
共 27 条
  • [1] Dynamic stacking ensemble for cross-language code smell detection
    Aljamaan, Hamoud
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [2] Automated Vulnerability Detection in Source Code Using Deep Representation Learning
    Russell, Rebecca L.
    Kim, Louis
    Hamilton, Lei H.
    Lazovich, Tomo
    Harer, Jacob A.
    Ozdemir, Onur
    Ellingwood, Paul M.
    McConley, Marc W.
    2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, : 757 - 762
  • [3] A Study on Cross-Language Text Summarization Using Supervised Methods
    Yu, Lei
    Ren, Fuji
    IEEE NLP-KE 2009: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2009, : 586 - 592
  • [4] Mapping Source Code to Software Architecture by Leveraging Large Language Models
    Johansson, Nils
    Caporuscio, Mauro
    Olsson, Tobias
    SOFTWARE ARCHITECTURE, ECSA 2024 TRACKS AND WORKSHOPS, 2024, 14937 : 133 - 149
  • [5] Predicting Security Vulnerabilities using Source Code Metrics
    Ganesh, Sundarakrishnan
    Ohlsson, Tobias
    Palma, Francis
    PROCEEDINGS OF THE 2021 SWEDISH WORKSHOP ON DATA SCIENCE (SWEDS), 2021,
  • [6] Using KCCA for Japanese–English cross-language information retrieval and document classification
    Yaoyong Li
    John Shawe-Taylor
    Journal of Intelligent Information Systems, 2006, 27 : 117 - 133
  • [7] The Landscape of Source Code Representation Learning in AI-Driven Software Engineering Tasks
    Chimalakonda, Sridhar
    Das, Debeshee
    Mathai, Alex
    Tamilselvam, Srikanth
    Kumar, Atul
    2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: COMPANION PROCEEDINGS, ICSE-COMPANION, 2023, : 342 - 343
  • [8] Making More with Less: Improving Software Testing Outcomes Using a Cross-Project and Cross-Language ML Classifier Based on Cost-Sensitive Training
    Nascimento, Alexandre M.
    Shimanuki, Gabriel Kenji G.
    Dias, Luiz Alberto V.
    APPLIED SCIENCES-BASEL, 2024, 14 (11):
  • [9] Authorship Attribution of Source Code: A Language-Agnostic Approach and Applicability in Software Engineering
    Bogomolov, Egor
    Kovalenko, Vladimir
    Rebryk, Yurii
    Bacchelli, Alberto
    Bryksin, Timofey
    PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '21), 2021, : 932 - 944
  • [10] Using KCCA for Japanese-English cross-language information retrieval and document classification
    Li, Yaoyong
    Shawe-Taylor, John
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2006, 27 (02) : 117 - 133