A unified algorithm framework for mean-variance optimization in discounted Markov decision processes

被引：4

作者：

Ma, Shuai ^{[1
]}

Ma, Xiaoteng ^{[2
]}

Xia, Li ^{[1
]}

机构：

[1] Sun Yat Sen Univ, Sch Business, Guangzhou 510275, Peoples R China

[2] Tsinghua Univ, Dept Automat, Beijing 100086, Peoples R China

来源：

EUROPEAN JOURNAL OF OPERATIONAL RESEARCH | 2023年 / 311卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Dynamic programming; Markov decision process; Discounted mean-variance; Bilevel optimization; Bellman local-optimality equation; PORTFOLIO SELECTION; PROSPECT-THEORY; TRADEOFFS; MODEL; RISK;

D O I：

10.1016/j.ejor.2023.06.022

中图分类号：

C93 [管理学];

学科分类号：

12 ; 1201 ; 1202 ; 120202 ;

摘要：

This paper studies the risk-averse mean-variance optimization in infinite-horizon discounted Markov de-cision processes (MDPs). The involved variance metric concerns reward variability during the whole pro-cess, and future deviations are discounted to their present values. This discounted mean-variance op-timization yields a reward function dependent on a discounted mean, and this dependency renders traditional dynamic programming methods inapplicable since it suppresses a crucial property-time-consistency. To deal with this unorthodox problem, we introduce a pseudo mean to transform the un-treatable MDP to a standard one with a redefined reward function in standard form and derive a dis-counted mean-variance performance difference formula. With the pseudo mean, we propose a unified al-gorithm framework with a bilevel optimization structure for the discounted mean-variance optimization. The framework unifies a variety of algorithms for several variance-related problems, including, but not limited to, risk-averse variance and mean-variance optimizations in discounted and average MDPs. Fur-thermore, the convergence analyses missing from the literature can be complemented with the proposed framework as well. Taking the value iteration as an example, we develop a discounted mean-variance value iteration algorithm and prove its convergence to a local optimum with the aid of a Bellman local-optimality equation. Finally, we conduct a numerical experiment on portfolio management to validate the proposed algorithm.& COPY; 2023 Elsevier B.V. All rights reserved.

引用

页码：1057 / 1067

页数：11

共 40 条

[11] Di Castro D., 2012, P 29 INT C MACHINE L, P935
[12] Multilevel Optimization Modeling for Risk-Averse Stochastic Programming
Eckstein, Jonathan
Eskandani, Deniz
Fan, Jingnan
[J]. INFORMS JOURNAL ON COMPUTING, 2016, 28 (01) : 112 - 128
[13] VARIANCE-PENALIZED MARKOV DECISION-PROCESSES
FILAR, JA
KALLENBERG, LCM
LEE, HM
[J]. MATHEMATICS OF OPERATIONS RESEARCH, 1989, 14 (01) : 147 - 161
[14] Variance-penalized Markov decision processes: dynamic programming and reinforcement learning techniques
Gosavi, Abhijit
[J]. INTERNATIONAL JOURNAL OF GENERAL SYSTEMS, 2014, 43 (06) : 649 - 669
[15] A mean-variance optimization problem for discounted Markov decision processes
Guo, Xianping
Ye, Liuer
Yin, George
[J]. EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2012, 220 (02) : 423 - 429
[16] Minimum variance performance map for constrained model predictive control
Harrison, Christopher A.
Qin, S. Joe
[J]. JOURNAL OF PROCESS CONTROL, 2009, 19 (07) : 1199 - 1204
[17] Sample-path optimality and variance-minimization of average cost Markov control processes
Hernández-Lerma, O
Vega-Amaya, O
Carrasco, G
[J]. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 1999, 38 (01) : 79 - 93
[18] Ho Y. C., 1991, PERTURBATION ANAL DI
[19] A Mean-Variance Framework for Tests of Asset Pricing Models
Kandel, Shmuel
Stambaugh, Robert F.
[J]. REVIEW OF FINANCIAL STUDIES, 1989, 2 (02) : 125 - 156
[20] Integrated Commodity Inventory Management and Financial Hedging: A Dynamic Mean-Variance Analysis
Kouvelis, Panos
Pang, Zhan
Ding, Qing
[J]. PRODUCTION AND OPERATIONS MANAGEMENT, 2018, 27 (06) : 1052 - 1073

← 1 2 3 4 →