A unified algorithm framework for mean-variance optimization in discounted Markov decision processes

被引:4
作者
Ma, Shuai [1 ]
Ma, Xiaoteng [2 ]
Xia, Li [1 ]
机构
[1] Sun Yat Sen Univ, Sch Business, Guangzhou 510275, Peoples R China
[2] Tsinghua Univ, Dept Automat, Beijing 100086, Peoples R China
基金
中国国家自然科学基金;
关键词
Dynamic programming; Markov decision process; Discounted mean-variance; Bilevel optimization; Bellman local-optimality equation; PORTFOLIO SELECTION; PROSPECT-THEORY; TRADEOFFS; MODEL; RISK;
D O I
10.1016/j.ejor.2023.06.022
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
This paper studies the risk-averse mean-variance optimization in infinite-horizon discounted Markov de-cision processes (MDPs). The involved variance metric concerns reward variability during the whole pro-cess, and future deviations are discounted to their present values. This discounted mean-variance op-timization yields a reward function dependent on a discounted mean, and this dependency renders traditional dynamic programming methods inapplicable since it suppresses a crucial property-time-consistency. To deal with this unorthodox problem, we introduce a pseudo mean to transform the un-treatable MDP to a standard one with a redefined reward function in standard form and derive a dis-counted mean-variance performance difference formula. With the pseudo mean, we propose a unified al-gorithm framework with a bilevel optimization structure for the discounted mean-variance optimization. The framework unifies a variety of algorithms for several variance-related problems, including, but not limited to, risk-averse variance and mean-variance optimizations in discounted and average MDPs. Fur-thermore, the convergence analyses missing from the literature can be complemented with the proposed framework as well. Taking the value iteration as an example, we develop a discounted mean-variance value iteration algorithm and prove its convergence to a local optimum with the aid of a Bellman local-optimality equation. Finally, we conduct a numerical experiment on portfolio management to validate the proposed algorithm.& COPY; 2023 Elsevier B.V. All rights reserved.
引用
收藏
页码:1057 / 1067
页数:11
相关论文
共 40 条
  • [1] [Anonymous], 2013, NIPS 13 P 27 INT C N
  • [2] Dynamic Mean-Variance Asset Allocation
    Basak, Suleyman
    Chabakauri, Georgy
    [J]. REVIEW OF FINANCIAL STUDIES, 2010, 23 (08) : 2970 - 3016
  • [3] SENSITIVITY ANALYSIS FOR MEAN-VARIANCE PORTFOLIO PROBLEMS
    BEST, MJ
    GRAUER, RR
    [J]. MANAGEMENT SCIENCE, 1991, 37 (08) : 980 - 989
  • [4] Bisi L, 2020, PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P4583
  • [5] Bond portfolio optimization using dynamic factor models
    Caldeira, Joao F.
    Moura, Guilherme V.
    Santos, Andre A. P.
    [J]. JOURNAL OF EMPIRICAL FINANCE, 2016, 37 : 128 - 158
  • [6] Cao X.-R., 2007, STOCHASTIC LEARNING
  • [7] MEAN-VARIANCE TRADEOFFS IN AN UNDISCOUNTED MDP - THE UNICHAIN CASE
    CHUNG, KJ
    [J]. OPERATIONS RESEARCH, 1994, 42 (01) : 184 - 188
  • [8] Survey on Multi-period Mean-Variance Portfolio Selection Model
    Cui, Xiang-Yu
    Gao, Jian-Jun
    Li, Xun
    Shi, Yun
    [J]. JOURNAL OF THE OPERATIONS RESEARCH SOCIETY OF CHINA, 2022, 10 (03) : 599 - 622
  • [9] Portfolio Optimization with Nonparametric Value at Risk: A Block Coordinate Descent Method
    Cui, Xueting
    Sun, Xiaoling
    Zhu, Shushang
    Jiang, Rujun
    Li, Duan
    [J]. INFORMS JOURNAL ON COMPUTING, 2018, 30 (03) : 454 - 471
  • [10] A Dynamic Mean-Variance Analysis for Log Returns
    Dai, Min
    Jin, Hanqing
    Kou, Steven
    Xu, Yuhong
    [J]. MANAGEMENT SCIENCE, 2021, 67 (02) : 1093 - 1108