Reward-based Monte Carlo-Bayesian reinforcement learning for cyber preventive maintenance

被引:14
作者
Allen, Theodore T. [1 ]
Roychowdhury, Sayak [2 ]
Liu, Enhao [1 ]
机构
[1] Ohio State Univ, Integrated Syst Engn, 1971 Neil Ave 210 Baker Syst, Columbus, OH 43221 USA
[2] Indian Inst Technol, Ind & Syst Engn, Kharagpur 721302, W Bengal, India
基金
美国国家科学基金会;
关键词
Preventative maintenance; Cyber security; Markov decision processes; Parametric uncertainty; MULTIPLE TASKS;
D O I
10.1016/j.cie.2018.09.051
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This article considers a preventive maintenance problem related to cyber security in universities. A Bayesian Reinforcement Learning (BRL) problem is formulated using limited data from scan results and intrusion detection system warnings. The median estimated learning time (MELT) measure is introduced to evaluate the speed at which a control system effectively eliminates parametric uncertainty and probability is concentrated on a single scenario. It is demonstrated that the Monte Carlo BRL with enhancements including Latin hypercube sampling (LHS) to generate scenarios, identical systems multi-task learning, and reward-based learning achieves shorter MELT values, i.e., "faster" learning, and improved objective values compared with alternatives in a numerical study. Rigorous results establish the optimality of the derived control strategies and the fact that optimal learning is possible under steady state assumptions. Also, the real-world case study of policies for patching Linux critical server cyber vulnerabilities generates insights including the potential to reduce expenditure per host by mandating compensating controls for critical vulnerabilities.
引用
收藏
页码:578 / 594
页数:17
相关论文
共 47 条
  • [1] Control charting methods for autocorrelated cyber vulnerability data
    Afful-Dadzie, Anthony
    Allen, Theodore T.
    [J]. QUALITY ENGINEERING, 2016, 28 (03) : 313 - 325
  • [2] Data-Driven Cyber-Vulnerability Maintenance Policies
    Afful-Dadzie, Anthony
    Allen, Theodore T.
    [J]. JOURNAL OF QUALITY TECHNOLOGY, 2014, 46 (03) : 234 - 250
  • [3] Optimally solving Markov decision processes with total expected discounted reward function: Linear programming revisited
    Alagoz, Oguzhan
    Ayvaci, Mehmet U. S.
    Linderoth, Jeffrey T.
    [J]. COMPUTERS & INDUSTRIAL ENGINEERING, 2015, 87 : 311 - 316
  • [4] Timely Decision Analysis Enabled by Efficient Social Media Modeling
    Allen, Theodore T.
    Sui, Zhenhuan
    Parker, Nathan L.
    [J]. DECISION ANALYSIS, 2017, 14 (04) : 250 - 260
  • [5] Allen TT, 2011, INTRODUCTION TO DISCRETE EVENT SIMULATION AND AGENT-BASED MODELING: VOTING SYSTEMS, HEALTH CARE, MILITARY, AND MANUFACTURING, P1, DOI 10.1007/978-0-85729-139-4
  • [6] Ando RK, 2005, J MACH LEARN RES, V6, P1817
  • [7] [Anonymous], 2014, Advances in Neural Information Processing Systems
  • [8] [Anonymous], 2002, Optimal Learning: Computational procedures for Bayes-adaptive Markov decision processes
  • [9] [Anonymous], 2014, Markov decision processes: discrete stochastic dynamic programming
  • [10] [Anonymous], 2017, CoRR, abs/1707.08114