INVESTIGATING THE USE OF REINFORCEMENT LEARNING FOR MULTI-FIDELITY MODEL SELECTION IN THE CONTEXT OF DESIGN DECISION MAKING

被引：0

作者：

Chhabra, Jaskanwal P. S. ^{[1
]}

Warn, Gordon P. ^{[1
]}

机构：

[1] Penn State Univ, Dept Civil & Environm Engn, University Pk, PA 16802 USA

来源：

PROCEEDINGS OF THE ASME INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, 2018, VOL 2B | 2018年

基金：

美国国家科学基金会;

关键词：

Reinforcement learning; Tradespace; Decision Making; Sequential Decision Process; Design; Multi-Fidelity;

D O I：

暂无

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

Engineers often employ, formally or informally, multi fidelity computational models to aid design decision making. For example, recently the idea of viewing design as a Sequential Decision Process (SDP) provides a formal framework of sequencing multi-fidelity models to realize computational gains in the design process. Efficiency is achieved in the SDP because dominated designs are removed using less expensive (low-fidelity) models before using higher-fidelity models with the guarantee the antecedent model only removes design solutions that are dominated when analyzed using more detailed, higher-fidelity models. The set of multi-fidelity models and discrete decision states result in a combinatorial combination of modeling sequences, some of which require significantly fewer model evaluations than others. It is desirable to optimally sequence models; however, the optimal modeling policy can not be determined at the onset of SDP because the computational cost and discriminatory power of executing all models on all designs is unknown. In this study, the model selection problem is formulated as a Markov Decision Process and a classical reinforcement learning, namely Q-learning, is investigated to obtain and follow an approximately optimal modeling policy. The outcome is a methodology able to learn efficient sequencing of models by estimating their computational cost and discriminatory power while analyzing designs in the tradespace throughout the design process. Through application to a design example, the methodology is shown to: 1) effectively identify the approximate optimal modeling policy, and 2) efficiently converge upon a choice set.

引用

页数：13

共 31 条

[1] Experience Replay for Real-Time Reinforcement Learning Control [J].

Adam, Sander ;

Busoniu, Lucian ;

Babuska, Robert .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2012, 42 (02) :201-212

[2]

[Anonymous], 1993, ADAPTIVE DECISION MA

[3]

[Anonymous], ASME 2017 INT DES EN

[4]

[Anonymous], 2008, P 46 AIAA AER SCI M, DOI DOI 10.2514/6.2008-1235

[5]

[Anonymous], 1998, REINFORCEMENT LEARNI

[6]

[Anonymous], 2012, METHODOLOGY-EUR, V1 -

[7]

[Anonymous], 2015, P INT C LEARN REPR I

[8]

[Anonymous], P750 FEMA BUILD SEIS

[9]

Balling R., 1999, P 3 WORLD C STRUCTUR, P295

[10]

Bauchau OA, 2009, SOLID MECH APPL, V163, P1, DOI 10.1007/978-90-481-2516-6

← 1 2 3 4 →