Reliability assessment of off-policy deep reinforcement learning: A benchmark for aerodynamics

被引：1

作者：

Berger, Sandrine ^{[1
]}

Ramo, Andrea Arroyo ^{[1
]}

Guillet, Valentin ^{[2
]}

Lahire, Thibault ^{[2
]}

Martin, Brice ^{[2
]}

Jardin, Thierry ^{[1
]}

Rachelson, Emmanuel ^{[2
]}

机构：

[1] Univ Toulouse, Dept Aerodynam & Prop, ISAE SUPAERO, Toulouse, France

[2] Univ Toulouse, Dept Complex Syst Engn, ISAE SUPAERO, Toulouse, France

来源：

DATA-CENTRIC ENGINEERING | 2024年 / 5卷

关键词：

Benchmark for aerodynamics; computational fluid dynamics; deep reinforcement learning; off-policy algorithms; reliability; NEURAL-NETWORKS; FLOWS;

D O I：

10.1017/dce.2023.28

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep reinforcement learning (DRL) is promising for solving control problems in fluid mechanics, but it is a new field with many open questions. Possibilities are numerous and guidelines are rare concerning the choice of algorithms or best formulations for a given problem. Besides, DRL algorithms learn a control policy by collecting samples from an environment, which may be very costly when used with Computational Fluid Dynamics (CFD) solvers. Algorithms must therefore minimize the number of samples required for learning (sample efficiency) and generate a usable policy from each training (reliability). This paper aims to (a) evaluate three existing algorithms (DDPG, TD3, and SAC) on a fluid mechanics problem with respect to reliability and sample efficiency across a range of training configurations, (b) establish a fluid mechanics benchmark of increasing data collection cost, and (c) provide practical guidelines and insights for the fluid dynamics practitioner. The benchmark consists in controlling an airfoil to reach a target. The problem is solved with either a low-cost low -order model or with a high-fidelity CFD approach. The study found that DDPG and TD3 have learning stability issues highly dependent on DRL hyperparameters and reward formulation, requiring therefore significant tuning. In contrast, SAC is shown to be both reliable and sample efficient across a wide range of parameter setups, making it well suited to solve fluid mechanics problems and set up new cases without tremendous effort. In particular, SAC is resistant to small replay buffers, which could be critical if full -flow fields were to be stored.

引用

页数：32

共 50 条

[1] Off-Policy Deep Reinforcement Learning Based on Steffensen Value Iteration
Cheng, Yuhu
Chen, Lin
Chen, C. L. Philip
Wang, Xuesong
IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2021, 13 (04) : 1023 - 1032
[2] Off-Policy Differentiable Logic Reinforcement Learning
Zhang, Li
Li, Xin
Wang, Mingzhong
Tian, Andong
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021: RESEARCH TRACK, PT II, 2021, 12976 : 617 - 632
[3] Z-Score Experience Replay in Off-Policy Deep Reinforcement Learning
Yang, Yana
Xi, Meng
Dai, Huiao
Wen, Jiabao
Yang, Jiachen
SENSORS, 2024, 24 (23)
[4] Model-Based Off-Policy Deep Reinforcement Learning With Model-Embedding
Tan, Xiaoyu
Qu, Chao
Xiong, Junwu
Zhang, James
Qiu, Xihe
Jin, Yaochu
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (04): : 2974 - 2986
[5] An Off-Policy Trust Region Policy Optimization Method With Monotonic Improvement Guarantee for Deep Reinforcement Learning
Meng, Wenjia
Zheng, Qian
Shi, Yue
Pan, Gang
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (05) : 2223 - 2235
[6] A multi-step on-policy deep reinforcement learning method assisted by off-policy policy evaluation
Zhang, Huaqing
Ma, Hongbin
Mersha, Bemnet Wondimagegnehu
Jin, Ying
APPLIED INTELLIGENCE, 2024, 54 (21) : 11144 - 11159
[7] Traffic Signal Control Using End-to-End Off-Policy Deep Reinforcement Learning
Chu, Kai-Fung
Lam, Albert Y. S.
Li, Victor O. K.
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (07) : 7184 - 7195
[8] An off-policy deep reinforcement learning-based active learning for crime scene investigation image classification
Zhang, Yixin
Liu, Yang
Jiang, Guofan
Yang, Yuchen
Zhang, Jian
Jing, Yang
Roohallah, Alizadehsani
Ryszard, Tadeusiewicz
Pawel, Plawiak
INFORMATION SCIENCES, 2025, 710
[9] Off-policy deep reinforcement learning with automatic entropy adjustment for adaptive online grid emergency control
Zhang, Ying
Yue, Meng
Wang, Jianhui
ELECTRIC POWER SYSTEMS RESEARCH, 2023, 217
[10] Safe Off-Policy Deep Reinforcement Learning Algorithm for Volt-VAR Control in Power Distribution Systems
Wang, Wei
Yu, Nanpeng
Gao, Yuanqi
Shi, Jie
IEEE TRANSACTIONS ON SMART GRID, 2020, 11 (04) : 3008 - 3018

← 1 2 3 4 5 →