The Proposal of Double Agent Architecture using Actor-critic Algorithm for Penetration Testing

被引:14
作者
Nguyen, Hoang Viet [1 ]
Teerakanok, Songpon [2 ]
Inomata, Atsuo [3 ]
Uehara, Tetsutaro [1 ]
机构
[1] Ritsumeikan Univ, Coll Informat Sci & Engn, Cyber Secur Lab, Kyoto, Japan
[2] Ritsumeikan Univ, Res Org Sci & Technol, Kyoto, Japan
[3] Osaka Univ, Grad Sch Informat Sci & Technol, Suita, Osaka, Japan
来源
ICISSP: PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS SECURITY AND PRIVACY | 2021年
关键词
Penetration Testing; Deep Reinforcement Learning;
D O I
10.5220/0010232504400449
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reinforcement learning (RL) is a widely used machine learning method for optimal decision-making compared to rule-based methods. Because of that advantage, RL has also recently been used a lot in penetration testing (PT) problems to assist in planning and deploying cyber attacks. Although the complexity and size of networks keep increasing vastly every day, RL is currently applied only for small scale networks. This paper proposes a double agent architecture (DAA) approach that is able to drastically increase the size of the network which can be solved with RL. This work also examines the effectiveness of using current popular deep reinforcement learning algorithms including DQN, DDQN, Dueling DQN and D3QN algorithms for PT. The A2C algorithm using Wolpertinger architecture is also adopted as a baseline for comparing the results of the methods. All algorithms are evaluated using a proposed network simulator which is constructed as a Markov decision process (MDP). Our results demonstrate that DAA with A2C algorithm far outweighs other approaches when dealing with large network environments reaching up to 1000 hosts.
引用
收藏
页码:440 / 449
页数:10
相关论文
共 12 条
[1]  
[Anonymous], 2013, Playing atari with deep reinforcement learning
[2]  
[Anonymous], 2016, MCKIN Q
[3]  
Dulac-Arnold G., 2015, ARXIV PREPRINT ARXIV
[4]   Reinforcement Learning for Efficient Network Penetration Testing [J].
Ghanem, Mohamed C. ;
Chen, Thomas M. .
INFORMATION, 2020, 11 (01)
[5]  
Hasselt H., 2010, ADV NEURAL INFORM PR, V23, DOI DOI 10.5555/2997046.2997187
[6]  
Hoang V. N., P INT C FUTURE NETWO
[7]   Reinforcement learning: A survey [J].
Kaelbling, LP ;
Littman, ML ;
Moore, AW .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1996, 4 :237-285
[8]  
Mnih V., 2016, PROC INT C MACH LEAR, P1928
[9]  
Phillips C, 1999, NEW SECURITY PARADIGMS WOEKSHOP, PROCEEDINGS, P71
[10]  
Sarraute C, 2013, ARXIV PREPRINT ARXIV