Online estimation of objective function for continuous-time deterministic systems

被引：2

作者：

Asl, Hamed Jabbari ^{[1
]}

Uchibe, Eiji ^{[1
]}

机构：

[1] ATR Computat Neurosci Labs, Dept Brain Robot Interface, 2-2-2 Hikaridai,Seikacho, Soraku gun, Kyoto 6190288, Japan

来源：

NEURAL NETWORKS | 2024年 / 172卷

关键词：

Objective function estimation; Deterministic systems; Data-driven solution; Continuous-time systems;

D O I：

10.1016/j.neunet.2024.106116

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We developed two online data -driven methods for estimating an objective function in continuous -time linear and nonlinear deterministic systems. The primary focus addressed the challenge posed by unknown input dynamics (control mapping function) in the expert system, a critical element for an online solution of the problem. Our methods leverage both the learner's and expert's data for effective problem -solving. The first approach, which is model -free, estimates the expert's policy and integrates it into the learner agent to approximate the objective function associated with the optimal policy. The second approach estimates the input dynamics from the learner's data and combines it with the expert's input -state observations to tackle the objective function estimation problem. Compared to other methods for deterministic systems that rely on both the learner's and expert's data, our approaches offer reduced complexity by eliminating the need to estimate an optimal policy after each objective function update. We conduct a convergence analysis of the estimation techniques using Lyapunov-based methods. Numerical experiments validate the effectiveness of our developed methods.

引用

页数：11

共 37 条

[1] From inverse optimal control to inverse reinforcement learning: A historical review
Ab Azar, Nematollah
Shahmansoorian, Aref
Davoudi, Mohsen
[J]. ANNUAL REVIEWS IN CONTROL, 2020, 50 : 119 - 138
[2] Abbeel P., 2004, INT C MACHINE LEARNI
[3] Abbeel P., 2005, P 22 INT C MACH LEAR, P1, DOI DOI 10.1145/1102351.1102352
[4] A survey of inverse reinforcement learning
Adams, Stephen
Cody, Tyler
Beling, Peter A.
[J]. ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (06) : 4307 - 4346
[5] [Anonymous], 2006, P 23 INT C MACHINE L
[6] Ashwood ZC, 2022, ADV NEUR IN
[7] Online Data-Driven Inverse Reinforcement Learning for Deterministic Systems
Asl, Hamed Jabbari
Uchibe, Eiji
[J]. 2022 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2022, : 884 - 889
[8] A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems
Bhasin, S.
Kamalapurkar, R.
Johnson, M.
Vamvoudakis, K. G.
Lewis, F. L.
Dixon, W. E.
[J]. AUTOMATICA, 2013, 49 (01) : 82 - 92
[9] Concurrent Learning for Convergence in Adaptive Control without Persistency of Excitation
Chowdhary, Girish
Johnson, Eric
[J]. 49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2010, : 3674 - 3679
[10] Correia A, 2023, Arxiv, DOI arXiv:2303.11191

← 1 2 3 4 →