Python']Python API Misuse Mining and Classification Based on Hybrid Analysis and Attention Mechanism

被引:1
作者
He, Xincheng [1 ]
Liu, Xiaojin [1 ]
Xu, Lei [1 ]
机构
[1] Nanjing Univ, Dept Comp Sci & Technol, Nanjing, Peoples R China
关键词
API misuse; !text type='Python']Python[!/text; program analysis; mining; deep learning;
D O I
10.1142/S0218194023500432
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
APIs play a crucial role in contemporary software development, streamlining implementation and maintenance processes. However, improper API usage can result in significant issues such as unexpected outcomes, security vulnerabilities and system crashes. To detect API misuses, current methods primarily rely on comparing established API usage patterns with target points for automated detection, mainly based on pre-validated datasets. Nonetheless, there is a scarcity of publicly available datasets on API misuses and their corresponding fixes, which hinders data-driven research. Moreover, most existing techniques concentrate on statically typed languages, such as Java and C, with only a few addressing dynamic languages like Python effectively, due to difficulties in handling dynamic features. Therefore, it is essential to identify Python API misuses and their fixes automatically and promptly. In this paper, we introduce HatPAM, a Hybrid Analysis and Attention-based Python API-Misuse Miner, which (a) provides a method for automatically mining true-positive commits related to Python API-misuse fixes from GitHub and (b) presents the subsequent processing for classifying Python API misuses in true-positive cases. Particularly, HatPAM applies hybrid static analysis and introduces a structure-based attention mechanism to examine syntax, semantics and structural features in Python code context, and considers the consistency between code and developers' natural intent to significantly reduce false-positive cases. Evaluation on six popular Python projects reveals that HatPAM outperforms various state-of-the-art baselines, achieving up to 92.2% Precision, 86.7% Recall and 89.3% F1-score, indicating its capability to identify and classify Python API-misuse commits.
引用
收藏
页码:1567 / 1597
页数:31
相关论文
共 57 条
[1]  
Acharya M, 2009, LECT NOTES COMPUT SC, V5503, P370
[2]   A Systematic Evaluation of Static API-Misuse Detectors [J].
Amann, Sven ;
Hoan Anh Nguyen ;
Nadi, Sarah ;
Nguyen, Tien N. ;
Mezini, Mira .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2019, 45 (12) :1170-1188
[3]  
Amann Sven, 2018, Ph. D. thesis
[4]   API Code Recommendation using Statistical Learning from Fine-Grained Changes [J].
Anh Tuan Nguyen ;
Hilton, Michael ;
Codoban, Mihai ;
Hoan Anh Nguyen ;
Mast, Lily ;
Rademacher, Eli ;
Nguyen, Tien N. ;
Dig, Danny .
FSE'16: PROCEEDINGS OF THE 2016 24TH ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON FOUNDATIONS OF SOFTWARE ENGINEERING, 2016, :511-522
[5]   Detect, Fix, and Verify TensorFlow API Misuses [J].
Baker, Wilson ;
O'Connor, Michael ;
Shahamiri, Seyed Reza ;
Terragni, Valerio .
2022 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER 2022), 2022, :925-929
[6]   Analyzing the State of Static Analysis: A Large-Scale Evaluation in Open Source Software [J].
Beller, Moritz ;
Bholanath, Radjino ;
McIntosh, Shane ;
Zaidman, Andy .
2016 IEEE 23RD INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION, AND REENGINEERING (SANER), VOL 1, 2016, :470-481
[7]  
Bonifacio R., ARXIV
[8]   A Machine Learning Approach for Vulnerability Curation [J].
Chen, Yang ;
Santosa, Andrew E. ;
Yi, Ang Ming ;
Sharma, Abhishek ;
Sharma, Asankhaya ;
Lo, David .
2020 IEEE/ACM 17TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR, 2020, :32-42
[9]   An Empirical Study on API-Misuse Bugs in Open-Source C Programs [J].
Gu, Zuxing ;
Wu, Jiecheng ;
Liu, Jiaxiang ;
Zhou, Min ;
Gu, Ming .
2019 IEEE 43RD ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), VOL 1, 2019, :11-20
[10]  
He XC, 2023, Soft Anal Evol Reeng, P522, DOI [10.1109/SANER56733.2023.00055, 10.1109/ICPADS56603.2022.00074]