Risk-Sensitive Policy with Distributional Reinforcement Learning

被引:3
|
作者
Theate, Thibaut [1 ]
Ernst, Damien [1 ,2 ]
机构
[1] Univ Liege, Dept Elect Engn & Comp Sci, B-4031 Liege, Belgium
[2] Inst Polytech Paris, Informat Proc & Commun Lab, F-91120 Paris, France
关键词
distributional reinforcement learning; sequential decision-making; risk-sensitive policy; risk management; deep neural network;
D O I
10.3390/a16070325
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classical reinforcement learning (RL) techniques are generally concerned with the design of decision-making policies driven by the maximisation of the expected outcome. Nevertheless, this approach does not take into consideration the potential risk associated with the actions taken, which may be critical in certain applications. To address that issue, the present research work introduces a novel methodology based on distributional RL to derive sequential decision-making policies that are sensitive to the risk, the latter being modelled by the tail of the return probability distribution. The core idea is to replace the Q function generally standing at the core of learning schemes in RL by another function, taking into account both the expected return and the risk. Named the risk-based utility function U, it can be extracted from the random return distribution Z naturally learnt by any distributional RL algorithm. This enables the spanning of the complete potential trade-off between risk minimisation and expected return maximisation, in contrast to fully risk-averse methodologies. Fundamentally, this research yields a truly practical and accessible solution for learning risk-sensitive policies with minimal modification to the distributional RL algorithm, with an emphasis on the interpretability of the resulting decision-making process.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] GAN-based Deep Distributional Reinforcement Learning for Resource Management in Network Slicing
    Hua, Yuxiu
    Li, Rongpeng
    Zhao, Zhifeng
    Zhang, Honggang
    Chen, Xianfu
    2019 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2019,
  • [42] Exploiting Distributional Temporal Difference Learning to Deal with Tail Risk
    Bossaerts, Peter
    Huang, Shijie
    Yadav, Nitin
    RISKS, 2020, 8 (04) : 1 - 20
  • [43] Real-time local path planning strategy based on deep distributional reinforcement learning
    Du, Shengli
    Zhu, Zexing
    Wang, Xuefang
    Han, Honggui
    Qiao, Junfei
    NEUROCOMPUTING, 2024, 599
  • [44] OVERCOMING VALUE OVERESTIMATION FOR DISTRIBUTIONAL REINFORCEMENT LEARNING-BASED PATH PLANNING WITH CONSERVATIVE CONSTRAINTS
    Gu, Yuwan
    Chu, Yongtao
    Meng, Fang
    Chen, Yan
    Lv, Jidong
    Xu, Shoukun
    INTERNATIONAL JOURNAL OF ROBOTICS & AUTOMATION, 2025, 40 (02): : 124 - 132
  • [45] Developing Real-Time Scheduling Policy by Deep Reinforcement Learning
    Bo, Zitong
    Qiao, Ying
    Leng, Chang
    Wang, Hongan
    Guo, Chaoping
    Zhang, Shaohui
    2021 IEEE 27TH REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM (RTAS 2021), 2021, : 131 - 142
  • [46] Multi-compartment neuron and population encoding powered spiking neural network for deep distributional reinforcement learning
    Sun, Yinqian
    Zhao, Feifei
    Zhao, Zhuoya
    Zeng, Yi
    NEURAL NETWORKS, 2025, 182
  • [47] REINFORCEMENT LEARNING OF SPEECH RECOGNITION SYSTEM BASED ON POLICY GRADIENT AND HYPOTHESIS SELECTION
    Kato, Taku
    Shinozaki, Takahiro
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5759 - 5763
  • [48] Multi-Ship Dynamic Weapon-Target Assignment via Cooperative Distributional Reinforcement Learning With Dynamic Reward
    Peng, Zhe
    Lu, Zhifeng
    Mao, Xiao
    Ye, Feng
    Huang, Kuihua
    Wu, Guohua
    Wang, Ling
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024,
  • [49] Risk management for nuclear medical department using reinforcement learning algorithms
    Paragliola G.
    Naeem M.
    Journal of Reliable Intelligent Environments, 2019, 5 (02): : 105 - 113
  • [50] ACK-Less Rate Adaptation Using Distributional Reinforcement Learning for Reliable IEEE 802.11bc Broadcast WLANs
    Kanda, Takamochi
    Koda, Yusuke
    Kihira, Yuto
    Yamamoto, Koji
    Nishio, Takayuki
    IEEE ACCESS, 2022, 10 : 58858 - 58868