Probabilistic Automata-Based Method for Enhancing Performance of Deep Reinforcement Learning Systems

被引：2

作者：

Yang, Min ^{[1
]}

Liu, Guanjun ^{[1
]}

Zhou, Ziyuan ^{[1
]}

Wang, Jiacun ^{[2
]}

机构：

[1] Tongji Univ, Dept Comp Sci, Shanghai 201804, Peoples R China

[2] Monmouth Univ, Comp Sci & Software Engn Dept, West Long Branch, NJ 07764 USA

来源：

IEEE-CAA JOURNAL OF AUTOMATICA SINICA | 2024年 / 11卷 / 11期

关键词：

Job shop scheduling; Decision making; Automata; Probabilistic logic; Deep reinforcement learning; Real-time systems; Trajectory; Manufacturing; Standards; Monitoring; Deep reinforcement learning (DRL); performance improvement framework; probabilistic automata; real-time monitoring; the key probabilistic decision-making units (PDMU)-action pair;

D O I：

10.1109/JAS.2024.124818

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep reinforcement learning (DRL) has demonstrated significant potential in industrial manufacturing domains such as workshop scheduling and energy system management. However, due to the model's inherent uncertainty, rigorous validation is requisite for its application in real-world tasks. Specific tests may reveal inadequacies in the performance of pre-trained DRL models, while the "black-box" nature of DRL poses a challenge for testing model behavior. We propose a novel performance improvement framework based on probabilistic automata, which aims to proactively identify and correct critical vulnerabilities of DRL systems, so that the performance of DRL models in real tasks can be improved with minimal model modifications. First, a probabilistic automaton is constructed from the historical trajectory of the DRL system by abstracting the state to generate probabilistic decision-making units (PDMUs), and a reverse breadth-first search (BFS) method is used to identify the key PDMU-action pairs that have the greatest impact on adverse outcomes. This process relies only on the state-action sequence and final result of each trajectory. Then, under the key PDMU, we search for the new action that has the greatest impact on favorable results. Finally, the key PDMU, undesirable action and new action are encapsulated as monitors to guide the DRL system to obtain more favorable results through real-time monitoring and correction mechanisms. Evaluations in two standard reinforcement learning environments and three actual job scheduling scenarios confirmed the effectiveness of the method, providing certain guarantees for the deployment of DRL models in real-world applications.

引用

页码：2327 / 2339

页数：13

共 53 条

[1]

Adebayo J, 2018, ADV NEUR IN, V31

[2]

Ali Ali Mohamed, 2023, 2023 IEEE World AI IoT Congress (AIIoT), P0797, DOI 10.1109/AIIoT58121.2023.10174426

[3]

Arjona-Medina Jose A., 2019, P ANN C NEUR INF PRO, V32, P13544

[4]

Atrey A., 2020, P 8 INT C LEARN REPR

[5] Probabilistic Guarantees for Safe Deep Reinforcement Learning [J].

Bacci, Edoardo ;

Parker, David .

FORMAL MODELING AND ANALYSIS OF TIMED SYSTEMS, FORMATS 2020, 2020, 12288 :231-248

[6]

Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473

[7]

Brockman Greg, 2016, arXiv

[8] A Deep Reinforcement Learning Framework Based on an Attention Mechanism and Disjunctive Graph Embedding for the Job-Shop Scheduling Problem [J].

Chen, Ruiqi ;

Li, Wenxin ;

Yang, Hongbing .

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2023, 19 (02) :1322-1331

[9] Robust Multi-Agent Reinforcement Learning Method Based on Adversarial Domain Randomization for Real-World Dual-UAV Cooperation [J].

Chen, Shutong ;

Liu, Guanjun ;

Zhou, Ziyuan ;

Zhang, Kaiwen ;

Wang, Jiacun .

IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (01) :1615-1627

[10] Optimisation of PAHs biodegradation by Klebsiella pneumonia and Pseudomonas aeruginosa through response surface methodology [J].

Chen, Tao ;

Fu, Bo ;

Li, Haiyan .

ENVIRONMENTAL TECHNOLOGY, 2024, 45 (24) :5204-5217

← 1 2 3 4 5 6 →