Unsupervised reward engineering for reinforcement learning controlled manufacturing

被引：0

作者：

Hirtz, Thomas ^{[1
]}

Tian, He ^{[1
]}

Yang, Yi ^{[1
]}

Ren, Tian-Ling ^{[1
]}

机构：

[1] Tsinghua Univ, Sch Integrated Circuits, Beijing 100084, Peoples R China

来源：

JOURNAL OF INTELLIGENT MANUFACTURING | 2024年

关键词：

Artificial intelligence; Semiconductor manufacturing; Reinforcement learning; Reward engineering;

D O I：

10.1007/s10845-024-02491-3

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reward engineering is a key challenge in reinforcement learning (RL) that can significantly affect the performance and applicability of RL algorithms. In the field of manufacturing, shaping the reward function for RL algorithms can be particularly difficult due to the complex and multi-objective nature of the manufacturing process. To address these challenges, we propose unsupervised reward engineering method based on a variational autoencoder (VAE) that uses the latent representation of the product for computing the environment's reward. Our approach optimizes the underlying distribution of the fabricated product directly by leveraging the latent space distance or divergence between the manufactured and ideal products. This strategy circumvents issues commonly associated with conventional reward engineering, such as misaligned and hacked rewards. Our technique enables convenient multi-objective optimization and reward value bounding. Through a beta\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document}-VAE architecture, we can adjust the weight of the Kullback-Leibler divergence term, prioritizing ideal characteristics or latent distribution based on the desired outcome. Applying our approach to semiconductor manufacturing, we demonstrate its benefits, including effective multi-objective optimization, stable reward, and meaningful data representations. Our method shows promise for optimizing complex manufacturing processes with RL and can be extended to various manufacturing-related fields. It can enhance product quality and offers opportunities for cross-facility manufacturing matching.

引用

页数：14

共 57 条

[1]

[Anonymous], 2008, MODERN VLSI DESIGN I

[2] Editorial: intelligent manufacturing systems towards industry 4.0 era [J].

Barari, Ahmad ;

de Sales Guerra Tsuzuki, Marcos ;

Cohen, Yuval ;

Macchi, Marco .

JOURNAL OF INTELLIGENT MANUFACTURING, 2021, 32 (07) :1793-1796

[3] PROCESS-CONTROL IN SEMICONDUCTOR MANUFACTURING [J].

BUTLER, SW .

JOURNAL OF VACUUM SCIENCE & TECHNOLOGY B, 1995, 13 (04) :1917-1923

[4] Neural Approach for Modeling and Optimizing Si-MOSFET Manufacturing [J].

Choi, Hyun-Chul ;

Yun, Hyeok ;

Yoon, Jun-Sik ;

Baek, Rock-Hyun .

IEEE ACCESS, 2020, 8 :159351-159370

[5] Data mining in manufacturing: a review based on the kind of knowledge [J].

Choudhary, A. K. ;

Harding, J. A. ;

Tiwari, M. K. .

JOURNAL OF INTELLIGENT MANUFACTURING, 2009, 20 (05) :501-521

[6] Virtual metrology for chemical mechanical planarization of semiconductor wafers [J].

Deivendran, Balamurugan ;

Masampally, Vishnu Swaroopji ;

Nadimpalli, Naga Ravikumar Varma ;

Runkana, Venkataramana .

JOURNAL OF INTELLIGENT MANUFACTURING, 2025, 36 (03) :1923-1942

[7]

Dewey D, 2014, 2014 AAAI SPRING S S

[8] Big GCVAE: decision-making with adaptive transformer model for failure root cause analysis in semiconductor industry [J].

Ezukwoke, Kenneth ;

Hoayek, Anis ;

Batton-Hubert, Mireille ;

Boucher, Xavier ;

Gounet, Pascal ;

Adrian, Jerome .

JOURNAL OF INTELLIGENT MANUFACTURING, 2024, 36 (4) :2423-2438

[9]

Florensa C., 2017, C ROB LEARN, P482, DOI DOI 10.1080/00908319208908727

[10]

Florensa Carlos, 2018, INT C MACH LEARN

← 1 2 3 4 5 6 →