Reliability assessment of off-policy deep reinforcement learning: A benchmark for aerodynamics

被引:1
|
作者
Berger, Sandrine [1 ]
Ramo, Andrea Arroyo [1 ]
Guillet, Valentin [2 ]
Lahire, Thibault [2 ]
Martin, Brice [2 ]
Jardin, Thierry [1 ]
Rachelson, Emmanuel [2 ]
机构
[1] Univ Toulouse, Dept Aerodynam & Prop, ISAE SUPAERO, Toulouse, France
[2] Univ Toulouse, Dept Complex Syst Engn, ISAE SUPAERO, Toulouse, France
来源
DATA-CENTRIC ENGINEERING | 2024年 / 5卷
关键词
Benchmark for aerodynamics; computational fluid dynamics; deep reinforcement learning; off-policy algorithms; reliability; NEURAL-NETWORKS; FLOWS;
D O I
10.1017/dce.2023.28
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep reinforcement learning (DRL) is promising for solving control problems in fluid mechanics, but it is a new field with many open questions. Possibilities are numerous and guidelines are rare concerning the choice of algorithms or best formulations for a given problem. Besides, DRL algorithms learn a control policy by collecting samples from an environment, which may be very costly when used with Computational Fluid Dynamics (CFD) solvers. Algorithms must therefore minimize the number of samples required for learning (sample efficiency) and generate a usable policy from each training (reliability). This paper aims to (a) evaluate three existing algorithms (DDPG, TD3, and SAC) on a fluid mechanics problem with respect to reliability and sample efficiency across a range of training configurations, (b) establish a fluid mechanics benchmark of increasing data collection cost, and (c) provide practical guidelines and insights for the fluid dynamics practitioner. The benchmark consists in controlling an airfoil to reach a target. The problem is solved with either a low-cost low -order model or with a high-fidelity CFD approach. The study found that DDPG and TD3 have learning stability issues highly dependent on DRL hyperparameters and reward formulation, requiring therefore significant tuning. In contrast, SAC is shown to be both reliable and sample efficient across a wide range of parameter setups, making it well suited to solve fluid mechanics problems and set up new cases without tremendous effort. In particular, SAC is resistant to small replay buffers, which could be critical if full -flow fields were to be stored.
引用
收藏
页数:32
相关论文
共 50 条
  • [1] Off-Policy Deep Reinforcement Learning Based on Steffensen Value Iteration
    Cheng, Yuhu
    Chen, Lin
    Chen, C. L. Philip
    Wang, Xuesong
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2021, 13 (04) : 1023 - 1032
  • [2] Off-Policy Differentiable Logic Reinforcement Learning
    Zhang, Li
    Li, Xin
    Wang, Mingzhong
    Tian, Andong
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021: RESEARCH TRACK, PT II, 2021, 12976 : 617 - 632
  • [3] Z-Score Experience Replay in Off-Policy Deep Reinforcement Learning
    Yang, Yana
    Xi, Meng
    Dai, Huiao
    Wen, Jiabao
    Yang, Jiachen
    SENSORS, 2024, 24 (23)
  • [4] Model-Based Off-Policy Deep Reinforcement Learning With Model-Embedding
    Tan, Xiaoyu
    Qu, Chao
    Xiong, Junwu
    Zhang, James
    Qiu, Xihe
    Jin, Yaochu
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (04): : 2974 - 2986
  • [5] An Off-Policy Trust Region Policy Optimization Method With Monotonic Improvement Guarantee for Deep Reinforcement Learning
    Meng, Wenjia
    Zheng, Qian
    Shi, Yue
    Pan, Gang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (05) : 2223 - 2235
  • [6] A multi-step on-policy deep reinforcement learning method assisted by off-policy policy evaluation
    Zhang, Huaqing
    Ma, Hongbin
    Mersha, Bemnet Wondimagegnehu
    Jin, Ying
    APPLIED INTELLIGENCE, 2024, 54 (21) : 11144 - 11159
  • [7] Traffic Signal Control Using End-to-End Off-Policy Deep Reinforcement Learning
    Chu, Kai-Fung
    Lam, Albert Y. S.
    Li, Victor O. K.
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (07) : 7184 - 7195
  • [8] An off-policy deep reinforcement learning-based active learning for crime scene investigation image classification
    Zhang, Yixin
    Liu, Yang
    Jiang, Guofan
    Yang, Yuchen
    Zhang, Jian
    Jing, Yang
    Roohallah, Alizadehsani
    Ryszard, Tadeusiewicz
    Pawel, Plawiak
    INFORMATION SCIENCES, 2025, 710
  • [9] Off-policy deep reinforcement learning with automatic entropy adjustment for adaptive online grid emergency control
    Zhang, Ying
    Yue, Meng
    Wang, Jianhui
    ELECTRIC POWER SYSTEMS RESEARCH, 2023, 217
  • [10] Safe Off-Policy Deep Reinforcement Learning Algorithm for Volt-VAR Control in Power Distribution Systems
    Wang, Wei
    Yu, Nanpeng
    Gao, Yuanqi
    Shi, Jie
    IEEE TRANSACTIONS ON SMART GRID, 2020, 11 (04) : 3008 - 3018