The Impact of Input Types on Smart Contract Vulnerability Detection Performance Based on Deep Learning: A Preliminary Study

被引:2
作者
Aldyaflah, Izdehar M. [1 ]
Zhao, Wenbing [1 ]
Yang, Shunkun [2 ]
Luo, Xiong [3 ]
机构
[1] Cleveland State Univ, Dept Elect & Comp Engn, Cleveland, OH 44115 USA
[2] Beihang Univ, Sch Reliabil & Syst Engn, Beijing 100191, Peoples R China
[3] Univ Sci & Technol Beijing, Sch Comp & Commun Engn, Beijing 100083, Peoples R China
基金
北京市自然科学基金;
关键词
blockchain; smart contract; vulnerability detection; Word2Vec; FastText; Bag-of-Words; Term Frequency-Inverse Document Frequency;
D O I
10.3390/info15060302
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Stemming vulnerabilities out of a smart contract prior to its deployment is essential to ensure the security of decentralized applications. As such, numerous tools and machine-learning-based methods have been proposed to help detect vulnerabilities in smart contracts. Furthermore, various ways of encoding the smart contracts for analysis have also been proposed. However, the impact of these input methods has not been systematically studied, which is the primary goal of this paper. In this preliminary study, we experimented with four common types of input, including Word2Vec, FastText, Bag-of-Words (BoW), and Term Frequency-Inverse Document Frequency (TF-IDF). To focus on the comparison of these input types, we used the same deep-learning model, i.e., convolutional neural networks, in all experiments. Using a public dataset, we compared the vulnerability detection performance of the four input types both in the binary classification scenarios and the multiclass classification scenario. Our findings show that TF-IDF is the best overall input type among the four. TF-IDF has excellent detection performance in all scenarios: (1) it has the best F1 score and accuracy in binary classifications for all vulnerability types except for the delegate vulnerability where TF-IDF comes in a close second, and (2) it comes in a very close second behind BoW (within 0.8%) in the multiclass classification.
引用
收藏
页数:22
相关论文
共 33 条
[1]  
Abadi M., 2015, arXiv, DOI [10.48550/arXiv.1603.04467, DOI 10.48550/ARXIV.1603.04467]
[2]   A Survey of Attacks on Ethereum Smart Contracts (SoK) [J].
Atzei, Nicola ;
Bartoletti, Massimo ;
Cimoli, Tiziana .
PRINCIPLES OF SECURITY AND TRUST (POST 2017), 2017, 10204 :164-186
[3]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[4]  
Devlin J, 2019, Arxiv, DOI arXiv:1810.04805
[5]  
Dhillon V., 2017, Blockchain Enabled Applications, P67
[6]   Empirical Review of Automated Analysis Tools on 47,587 Ethereum Smart Contracts [J].
Durieux, Thomas ;
Ferreira, Joao F. ;
Abreu, Rui ;
Cruz, Pedro .
2020 ACM/IEEE 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2020), 2020, :530-541
[7]   Empirical Review of Java']Java Program Repair Tools: A Large-Scale Experiment on 2,141 Bugs and 23,551 Repair Attempts [J].
Durieux, Thomas ;
Madeiral, Fernanda ;
Martinez, Matias ;
Abreu, Rui .
ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2019, :302-313
[8]  
Fan Y., 2021, Collaborative Computing: Networking, Applications and Worksharing, P335
[9]  
Goodfellow I, 2016, ADAPT COMPUT MACH LE, P1
[10]  
Guo DY, 2021, Arxiv, DOI arXiv:2009.08366