Adaptive policies for time-varying stochastic systems under discounted criterion

被引：9

作者：

Hilgert, N

Minjárez-Sosa, JA

机构：

[1] ENSAM, INRA, Lab Biometrie, F-34060 Montpellier 1, France

[2] Univ Sonora, Dept Matemat, Hermosillo 83000, Sonora, Mexico

来源：

MATHEMATICAL METHODS OF OPERATIONS RESEARCH | 2001年 / 54卷 / 03期

关键词：

non-homogeneous Markov control processes; discrete-time stochastic systems; discounted cost criterion; optimal adaptive policy;

D O I：

10.1007/s001860100170

中图分类号：

C93 [管理学]; O22 [运筹学];

学科分类号：

070105 ; 12 ; 1201 ; 1202 ; 120202 ;

摘要：

We consider a class of time-varying stochastic control systems, with Borel state and action spaces, and possibly unbounded costs. The processes evolve according to a discrete-time equation x(n+1) = G(n)(x(n), a(n), xi(n)), n = 0, 1,..., where the xi(n) are i.i.d. R-k-valued random vectors whose common density is unknown, and the G, are given functions converging, in a restricted way, to some function Ginfinity as n --> infinity. Assuming observability of xi(n), we construct an adaptive policy which is asymptotically discounted cost optimal for the limiting control system x(n+1) = Ginfinity(x(n), a(n), xi(n)).

引用

页码：491 / 505

页数：15

共 18 条

[1]

Bastin G, 1990, ON LINE ESTIMATION A, V1

[2]

DUFLO M., 1997, Random Iterative Models

[3]

Dynkin E.B., 1979, Grundlehren der Mathematischen Wissenschaften, V235

[4]

Gordienko EI, 1998, KYBERNETIKA, V34, P217