Composing Synergistic Macro Actions for Reinforcement Learning Agents

被引:0
作者
Chen, Yu-Ming [1 ]
Chang, Kaun-Yu [2 ]
Liu, Chien [3 ]
Hsiao, Tsu-Ching [4 ]
Hong, Zhang-Wei [5 ]
Lee, Chun-Yi [4 ]
机构
[1] Taiwan Semicond Mfg Co TSMC, Hsinchu 300, Taiwan
[2] Avery Design Syst Inc, Taipei 100, Taiwan
[3] Univ Rostock, Fac Comp Sci & Elect Engn, D-18051 Rostock, Germany
[4] Natl Tsing Hua Univ, Dept Comp Sci, Hsinchu 300, Taiwan
[5] MIT, Cambridge, MA 02139 USA
关键词
Task analysis; Reinforcement learning; Markov processes; Learning systems; Behavioral sciences; Artificial neural networks; Planning; Macro actions; Markov decision process (MDP); neural architecture search (NAS); reinforcement learning (RL); synergism; NEURAL-NETWORKS;
D O I
10.1109/TNNLS.2022.3213606
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Macro actions have been demonstrated to be beneficial for the learning processes of an agent and have encouraged a variety of techniques to be developed for constructing more effective ones. However, previous techniques usually do not further consider combining macro actions to form a synergistic macro action ensemble, in which synergism exhibits when the constituent macro actions are favorable to be jointly used by an agent during evaluation. Such a synergistic macro action ensemble may potentially allow an agent to perform even better than the individual macro actions within it. Motivated by the recent advances of neural architecture search (NAS), in this brief, we formulate the construction of a synergistic macro action ensemble as a Markov decision process (MDP) and evaluate the constructed macro action ensemble as a whole. Such a problem formulation enables synergism to be taken into account by the proposed evaluation procedure. Our experimental results demonstrate that the proposed framework is able to discover the synergistic macro action ensembles. Furthermore, we also highlight the benefits of these macro action ensembles through a set of analytical cases.
引用
收藏
页码:7251 / 7258
页数:8
相关论文
共 44 条
[1]  
Abdar M, 2021, Arxiv, DOI arXiv:2011.06225
[2]   Hybrid genetic-discretized algorithm to handle data uncertainty in diagnosing stenosis of coronary arteries [J].
Alizadehsani, Roohallah ;
Roshanzamir, Mohamad ;
Abdar, Moloud ;
Beykikhoshk, Adham ;
Khosravi, Abbas ;
Nahavandi, Saeid ;
Plawiak, Pawel ;
Tan, Ru San ;
Acharya, U. Rajendra .
EXPERT SYSTEMS, 2022, 39 (07)
[3]   Model uncertainty quantification for diagnosis of each main coronary artery stenosis [J].
Alizadehsani, Roohallah ;
Roshanzamir, Mohamad ;
Abdar, Moloud ;
Beykikhoshk, Adham ;
Zangooei, Mohammad Hossein ;
Khosravi, Abbas ;
Nahavandi, Saeid ;
Tan, Ru San ;
Acharya, U. Rajendra .
SOFT COMPUTING, 2020, 24 (13) :10149-10160
[4]   Objective evaluation of deep uncertainty predictions for COVID-19 detection [J].
Asgharnezhad, Hamzeh ;
Shamsi, Afshar ;
Alizadehsani, Roohallah ;
Khosravi, Abbas ;
Nahavandi, Saeid ;
Sani, Zahra Alizadeh ;
Srinivasan, Dipti ;
Islam, Sheikh Mohammed Shariful .
SCIENTIFIC REPORTS, 2022, 12 (01)
[5]   Variational Inference: A Review for Statisticians [J].
Blei, David M. ;
Kucukelbir, Alp ;
McAuliffe, Jon D. .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2017, 112 (518) :859-877
[6]  
Blundell C, 2015, Arxiv, DOI arXiv:1505.05424
[7]  
Brach K, 2020, Arxiv, DOI arXiv:2007.03293
[8]  
Chen TQ, 2014, PR MACH LEARN RES, V32, P1683
[9]  
Chollet F, 2017, DEEP LEARNING PYTHON
[10]  
Ding N, 2014, ADV NEUR IN, V27