Automating post-exploitation with deep reinforcement learning

被引：36

作者：

Maeda, Ryusei ^{[1
]}

Mimura, Mamoru ^{[1
]}

机构：

[1] 1-10-20 Hashirimizu, Yokosuka, Kanagawa, Japan

来源：

COMPUTERS & SECURITY | 2021年 / 100卷

关键词：

Reinforcement learning; Post-exploitation; A2C; Q-Learning; SARSA; Deep reinforcement learning; Lateral movement;

D O I：

10.1016/j.cose.2020.102108

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In order to assess the risk of information systems, it is important to investigate the behavior of the attacker after successful exploitation (post-exploitation). However, the audit requires the experts, and to the best of our knowledge, there are no solutions to automate this process. This paper proposes a method of automating post-exploitation by combining deep reinforcement learning and the PowerShell Empire, which is famous as a post-exploitation framework. Our reinforcement learning agents select one of the PowerShell Empire modules as an action. The state of the agents is defined by 10 parameters such as type of account that was compromised by the agents. In the learning phase, we compared the learning progress of the 3 reinforcement learning models: A2C, QLearning, and SARSA. The result shows that the A2C could gain reward most efficiently. Moreover, the behavior of the trained agents are evaluated in a test domain network. The results show that the trained agent using A2C could obtain the administrative privileges to the domain controller. (C) 2020 The Authors. Published by Elsevier Ltd.

引用

页数：13

共 25 条

[1] [Anonymous], 2020, 29 USENIX SEC S USEN
[2] Apruzzese G, 2018, 10 INT C CYB CONFL C
[3] Machine Learning Cyberattack and Defense Strategies
Bland, John A.
Petty, Mikel D.
Whitaker, Tymaine S.
Maxwell, Katia P.
Cantrell, Walter Alan
[J]. COMPUTERS & SECURITY, 2020, 92
[4] Boileau A., 2005, BLACKHAT BRIEFINGS
[5] A new hybrid approach for intrusion detection using machine learning methods
Cavusoglu, Unal
[J]. APPLIED INTELLIGENCE, 2019, 49 (07) : 2735 - 2761
[6] SMOTE: Synthetic minority over-sampling technique
Chawla, Nitesh V.
Bowyer, Kevin W.
Hall, Lawrence O.
Kegelmeyer, W. Philip
[J]. 2002, American Association for Artificial Intelligence (16)
[7] Detection of Malicious Code Variants Based on Deep Learning
Cui, Zhihua
Xue, Fei
Cai, Xingjuan
Cao, Yang
Wang, Gai-ge
Chen, Jinjun
[J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2018, 14 (07) : 3187 - 3196
[8] Duckwall S., 2013, HELLO MY NAME IS MIC
[9] Dunagan J, 2009, P 22 ACM S OP SYST P
[10] Adversarial Reinforcement Learning in a Cyber Security Simulation
Elderman, Richard
Pater, Leon J. J.
Thie, Albert S.
Drugan, Madalina M.
Wiering, Marco A.
[J]. ICAART: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 2, 2017, : 559 - 566

← 1 2 3 →