Robust Vulnerability Detection in Solidity-Based Ethereum Smart Contracts Using Fine-Tuned Transformer Encoder Models

被引：1

作者：

Le, Thi-Thu-Huong ^{[1
,2
]}

Kim, Jaehyun ^{[2
]}

Lee, Sangmyeong ^{[3
]}

Kim, Howon ^{[3
]}

机构：

[1] Pusan Natl Univ, Blockchain Platform Res Ctr, Busan 609735, South Korea

[2] Pusan Natl Univ, IoT Res Ctr, Busan 609735, South Korea

[3] Pusan Natl Univ, Sch Comp Sci & Engn, Busan 609735, South Korea

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Smart contracts; Codes; Transformers; Security; Solid modeling; Analytical models; Training; Encoding; Biological system modeling; Large language models; Ethereum smart contracts; large language models; multi-class imbalance; multi-class classification; smart contract vulnerability; solidity code;

D O I：

10.1109/ACCESS.2024.3482389

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The rapid expansion of blockchain technology, particularly Ethereum, has driven widespread adoption of smart contracts. However, the security of these contracts remains a critical concern due to the increasing frequency and complexity of vulnerabilities. This paper presents a comprehensive approach to detecting vulnerabilities in Ethereum smart contracts using pre-trained Large Language Models (LLMs). We apply transformer-based LLMs, leveraging their ability to understand and analyze Solidity code to identify potential security flaws. Our methodology involves fine-tuning eight distinct pre-trained LLM models on curated datasets varying in types and distributions of vulnerabilities, including multi-class vulnerabilities. The datasets-SB Curate, Benmark Solidity Smart Contract, and ScrawID-were selected to ensure a thorough evaluation of model performance across different vulnerability types. We employed over-sampling techniques to address class imbalances, resulting in more reliable training outcomes. We extensively evaluate these models using precision, recall, accuracy, F1 score, and Receiver Operating Characteristics (ROC) curve metrics. Our results demonstrate that the transformer encoder architecture, with its multi-head attention and feed-forward mechanisms, effectively captures the nuances of smart contract vulnerabilities. The models show promising potential in enhancing the security and reliability of Ethereum smart contracts, offering a robust solution to challenges posed by software vulnerabilities in the blockchain ecosystem.

引用

页码：154700 / 154717

页数：18

共 52 条

[1] Dealing with Imbalanced Data in Multi-class Network Intrusion Detection Systems Using XGBoost
AL-Essa, Malik
Appice, Annalisa
[J]. MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2021, 1525 : 5 - 21
[2] [Anonymous], Model 4: simlm-msmarco-reranker
[3] [Anonymous], WordPiece Tokenizer
[4] [Anonymous], Model 3: distilbert-base-uncased-netuned-sst-2-english
[5] [Anonymous], Model 5: roberta-large-mnli
[6] [Anonymous], Model 2: prunebert-base-uncased-6-nepruned-w-distil-mnli
[7] [Anonymous], Model 6: xlm-roberta-large-netuned-conll03-english
[8] [Anonymous], Ethereum smart contract languages
[9] [Anonymous], Symbolic Execution Tool
[10] [Anonymous], Model 8: openai-community/roberta-base-openai-detector

← 1 2 3 4 5 6 →