Towards a Lightweight, Hybrid Approach for Detecting DOM XSS Vulnerabilities with Machine Learning

被引:22
作者
Melicher, William [1 ]
Fung, Clement [1 ]
Bauer, Lujo [1 ]
Jia, Limin [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
来源
PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2021 (WWW 2021) | 2021年
基金
美国国家科学基金会; 美国安德鲁·梅隆基金会;
关键词
web security; DOM XSS vulnerabilities; neural networks;
D O I
10.1145/3442381.3450062
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Client-side cross-site scripting (DOM XSS) vulnerabilities in web applications are common, hard to identify, and difficult to prevent. Taint tracking is the most promising approach for detecting DOM XSS with high precision and recall, but is too computationally expensive for many practical uses. We investigate whether machine learning (ML) classifiers can replace or augment taint tracking when detecting DOM XSS vulnerabilities. Through a large-scale web crawl, we collect over 18 billion JavaScript functions and use taint tracking to label over 180,000 functions as potentially vulnerable. With this data, we train a deep neural network (DNN) to analyze a JavaScript function and predict if it is vulnerable to DOM XSS. We experiment with a range of hyperparameters and present a low-latency, high-recall classifier that could serve as a pre-filter to taint tracking, reducing the cost of stand-alone taint tracking by 3.43x while detecting 94.5% of unique vulnerabilities. We argue that this combination of a DNN and taint tracking is efficient enough for a range of use cases for which taint tracking by itself is not, including in-browser run-time DOM XSS detection and analyzing large codebases.
引用
收藏
页码:2684 / 2695
页数:12
相关论文
共 48 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]  
Allamanis M., 2018, INT C LEARN REPR
[3]   code2vec: Learning Distributed Representations of Code [J].
Alon, Uri ;
Zilberstein, Meital ;
Levy, Omer ;
Yahav, Eran .
PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2019, 3 (POPL)
[4]  
[Anonymous], 2017, YET ANOTHER CHROME X
[5]  
[Anonymous], 2014, APPL VULN TRENDS REP
[6]  
[Anonymous], 2012, ICML, DOI 10.48550/arxiv.1206.6389
[7]  
[Anonymous], 2009, FEATURE HASHING LARG
[8]  
[Anonymous], 2017, TOP SIT US
[9]   Learning a Static Analyzer from Data [J].
Bielik, Pavol ;
Raychev, Veselin ;
Vechev, Martin .
COMPUTER AIDED VERIFICATION, CAV 2017, PT I, 2017, 10426 :233-253
[10]  
Bijjou K., 2015, OWASP open web application security project