Guided probabilistic reinforcement learning for sampling-efficient maintenance scheduling of multi-component system

被引:3
作者
Zhang, Yiming [1 ,2 ]
Zhang, Dingyang [1 ]
Zhang, Xiaoge [3 ]
Qiu, Lemiao [1 ]
Chan, Felix T. S. [4 ]
Wang, Zili [2 ]
Zhang, Shuyou [1 ]
机构
[1] Zhejiang Univ, State Key Lab Fluid Power & Mechatron Syst, Hangzhou 310027, Peoples R China
[2] Zhejiang Univ, Engn Res Ctr Design Engn & Digital Twin Zhejiang P, Hangzhou 310027, Peoples R China
[3] Hong Kong Polytech Univ, Dept Ind & Syst Engn, Kowloon, Hong Kong, Peoples R China
[4] Macau Univ Sci & Technol, Dept Decis Sci, Ave Wai Long, Taipa, Macao, Peoples R China
基金
中国国家自然科学基金;
关键词
Deep Reinforcement Learning; Multi-component System; Probabilistic Machine Learning; Maintenance Scheduling; Sampling-Efficient Learning; POLICY; RELIABILITY; ALGORITHM; MODEL;
D O I
10.1016/j.apm.2023.03.025
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
In recent years, multi-agent deep reinforcement learning has progressed rapidly as re-flected by its increasing adoptions in industrial applications. This paper proposes a Guided Probabilistic Reinforcement Learning (Guided-PRL) model to tackle maintenance schedul-ing of multi-component systems in the presence of uncertainty with the goal of minimiz-ing the overall life-cycle cost. The proposed Guided-PRL is deeply rooted in the Actor-Critic (AC) scheme. Since traditional AC falls short in sampling efficiency and suffers from getting stuck in local minima in the context of multi-agent reinforcement learning, it is thus chal-lenging for the actor network to converge to a solution of desirable quality even when the critic network is properly configured. To address these issues, we develop a generic frame-work to facilitate effective training of the actor network, and the framework consists of environmental reward modeling, degradation formulation, state representation, and policy optimization. The convergence speed of the actor network is significantly improved with a guided sampling scheme for environment exploration by exploiting rules-based domain expert policies. To handle data scarcity, the environmental modeling and policy optimiza-tion are approximated with Bayesian models for effective uncertainty quantification. The Guided-PRL model is evaluated using the simulations of a 12-component system as well as GE90 and CFM56 engines. Compared with four alternative deep reinforcement learn-ing schemes, the Guided-PRL lowers life-cycle cost by 34. 92% to 88. 07% . In comparison with rules-based expert policies, the Guided-PRL decreases the life-cycle cost by 23. 26% to 51. 36% .(c) 2023 Elsevier Inc. All rights reserved.
引用
收藏
页码:677 / 697
页数:21
相关论文
共 73 条
  • [21] Advances in Bayesian Probabilistic Modeling for Industrial Applications
    Ghosh, Sayan
    Pandita, Piyush
    Atkinson, Steven
    Subber, Waad
    Zhang, Miming
    Kumar, Natarajan Chennimalai
    Chakrabarti, Suryarghya
    Wang, Liping
    [J]. ASCE-ASME JOURNAL OF RISK AND UNCERTAINTY IN ENGINEERING SYSTEMS PART B-MECHANICAL ENGINEERING, 2020, 6 (03):
  • [22] Gu SX, 2017, Arxiv, DOI arXiv:1611.02247
  • [23] XAI-Explainable artificial intelligence
    Gunning, David
    Stefik, Mark
    Choi, Jaesik
    Miller, Timothy
    Stumpf, Simone
    Yang, Guang-Zhong
    [J]. SCIENCE ROBOTICS, 2019, 4 (37)
  • [24] Finite-time control of discrete-time semi-Markov jump linear systems: A self-triggered MPC approach
    He, Peng
    Wen, Jiwei
    Stojanovic, Vladimir
    Liu, Fei
    Luan, Xiaoli
    [J]. JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2022, 359 (13): : 6939 - 6957
  • [25] Smart Home in Smart Microgrid: A Cost-Effective Energy Ecosystem with Intelligent Hierarchical Agents
    Jiang, Bingnan
    Fei, Yunsi
    [J]. IEEE TRANSACTIONS ON SMART GRID, 2015, 6 (01) : 3 - 13
  • [26] Kochenderfer MJ, 2015, MIT LINCOLN LAB, P1
  • [27] Condition based maintenance in the context of opportunistic maintenance
    Koochaki, Javid
    Bokhorst, Jos A. C.
    Wortmann, Hans
    Klingenberg, Warse
    [J]. INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2012, 50 (23) : 6918 - 6929
  • [28] Reinforcement learning for microgrid energy management
    Kuznetsova, Elizaveta
    Li, Yan-Fu
    Ruiz, Carlos
    Zio, Enrico
    Ault, Graham
    Bell, Keith
    [J]. ENERGY, 2013, 59 : 133 - 146
  • [29] An integrated assessment of safety and efficiency of aircraft maintenance strategies using agent-based modelling and stochastic Petri nets
    Lee, Juseong
    Mitici, Mihaela
    [J]. RELIABILITY ENGINEERING & SYSTEM SAFETY, 2020, 202
  • [30] Dynamic Bayesian Network for Aircraft Wing Health Monitoring Digital Twin
    Li, Chenzhao
    Mahadevan, Sankaran
    Ling, You
    Choze, Sergio
    Wang, Liping
    [J]. AIAA JOURNAL, 2017, 55 (03) : 930 - 941