Reinforcement Learning for Efficient Network Penetration Testing

被引:70
作者
Ghanem, Mohamed C. [1 ]
Chen, Thomas M. [1 ]
机构
[1] Univ London, Sch Math Comp Sci & Engn, London EC1V 0HB, England
关键词
penetration testing; artificial intelligence; machine learning; reinforcement learning; network security auditing; offensive cyber-security; vulnerability assessment; VALUE-ITERATION;
D O I
10.3390/info11010006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Penetration testing (also known as pentesting or PT) is a common practice for actively assessing the defenses of a computer network by planning and executing all possible attacks to discover and exploit existing vulnerabilities. Current penetration testing methods are increasingly becoming non-standard, composite and resource-consuming despite the use of evolving tools. In this paper, we propose and evaluate an AI-based pentesting system which makes use of machine learning techniques, namely reinforcement learning (RL) to learn and reproduce average and complex pentesting activities. The proposed system is named Intelligent Automated Penetration Testing System (IAPTS) consisting of a module that integrates with industrial PT frameworks to enable them to capture information, learn from experience, and reproduce tests in future similar testing cases. IAPTS aims to save human resources while producing much-enhanced results in terms of time consumption, reliability and frequency of testing. IAPTS takes the approach of modeling PT environments and tasks as a partially observed Markov decision process (POMDP) problem which is solved by POMDP-solver. Although the scope of this paper is limited to network infrastructures PT planning and not the entire practice, the obtained results support the hypothesis that RL can enhance PT beyond the capabilities of any human PT expert in terms of time consumed, covered attacking vectors, accuracy and reliability of the outputs. In addition, this work tackles the complex problem of expertise capturing and re-use by allowing the IAPTS learning module to store and re-use PT policies in the same way that a human PT expert would learn but in a more efficient way.
引用
收藏
页数:23
相关论文
共 32 条
[1]  
Agrawal Shipra, 2017, Advances in Neural Information Processing Systems, P1184
[2]  
Almubairik N., 2016, P 11 INT C INT C INT
[3]  
Andrew Y., 2000, P 16 C UNC ART INT S
[4]  
[Anonymous], 2015, P 25 INT C AUT PLANN
[5]  
[Anonymous], ARXIV170505088
[6]  
[Anonymous], 2011, P 28 INT C MACH LEAR
[7]  
[Anonymous], 2012, Partially Observable Markov Decision Processes
[8]  
[Anonymous], 2016, PREPRINT
[9]   Intelligent, Automated Red Team Emulation [J].
Applebaum, Andy ;
Miller, Doug ;
Strom, Blake ;
Korban, Chris ;
Wolf, Ross .
32ND ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE (ACSAC 2016), 2016, :363-373
[10]  
Creasey J, 2017, GUIDE RUNNING EFFECT