Feature Selection for Data Classification in the Semiconductor Industry by a Hybrid of Simplified Swarm Optimization

被引:2
作者
Yeh, Wei-Chang [1 ]
Chu, Chia-Li [1 ]
机构
[1] Natl Tsing Hua Univ, Dept Ind Engn & Engn Management, POB 24-60, Hsinchu 300, Taiwan
关键词
hybrid feature selection; simplified swarm optimization; semiconductor manufacturing; NEAREST NEIGHBOR RULE; FAULT-DETECTION; GENETIC ALGORITHM; INFORMATION; PATTERNS;
D O I
10.3390/electronics13122242
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the semiconductor manufacturing industry, achieving high yields constitutes one of the pivotal factors for sustaining market competitiveness. When confronting the substantial volume of high-dimensional, non-linear, and imbalanced data generated during semiconductor manufacturing processes, it becomes imperative to transcend traditional approaches and incorporate machine learning methodologies. By employing non-linear classification models, one can achieve more real-time anomaly detection, subsequently facilitating a deeper analysis of the fundamental causes behind anomalies. Given the considerable dimensionality of production line data in semiconductor manufacturing, there arises a necessity for dimensionality reduction to mitigate noise and reduce computational costs within the data. Feature selection stands out as one of the primary methodologies for achieving data dimensionality reduction. Utilizing wrapper-based heuristics algorithms, although characterized by high time complexity, often yields favorable performance in specific cases. If further combined into hybrid methodologies, they can concurrently satisfy data quality and computational cost considerations. Accordingly, this study proposes a two-stage feature selection model. Initially, redundant features are eliminated using mutual information to reduce the feature space. Subsequently, a Simplified Swarm Optimization algorithm is employed to design a unique fitness function aimed at selecting the optimal feature subset from candidate features. Finally, support vector machines are utilized as the classification model for validation purposes. For practical cases, it is evident that the feature selection method proposed in this study achieves superior classification accuracy with fewer features in the context of wafer anomaly classification problems. Furthermore, its performance on public datasets further substantiates the effectiveness and generalization capability of the proposed approach.
引用
收藏
页数:20
相关论文
共 71 条
[1]   Feature Selection Using Information Gain for Improved Structural-Based Alert Correlation [J].
Alhaj, Taqwa Ahmed ;
Siraj, Maheyzah Md ;
Zainal, Anazida ;
Elshoush, Huwaida Tagelsir ;
Elhaj, Fatin .
PLOS ONE, 2016, 11 (11)
[2]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[3]   On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems [J].
Amaldi, E ;
Kann, V .
THEORETICAL COMPUTER SCIENCE, 1998, 209 (1-2) :237-260
[4]  
Awad M., 2015, Efficient Learning Machines: Theories, Concepts, and Application for Engineers and System Designers, P39, DOI [DOI 10.1007/978-1-4302-5990-9, 10.1007/978-1-4302-5990-9, DOI 10.1007/978-1-4302-5990-93]
[5]  
Azhagusundari B., 2013, Int. J. Innov. Technol. Explor. Eng., V2, P18, DOI DOI 10.1371/JOURNAL.PONE.0166017
[6]  
Bae C, 2012, INT J INNOV COMPUT I, V8, P4391
[7]   SARA: A memetic algorithm for high-dimensional biomedical data [J].
Baliarsingh, Santos Kumar ;
Muhammad, Khan ;
Bakshi, Sambit .
APPLIED SOFT COMPUTING, 2021, 101
[8]   Wafer Classification Using Support Vector Machines [J].
Baly, Ramy ;
Hajj, Hazem .
IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, 2012, 25 (03) :373-383
[9]  
Beheshti Z., 2013, Int. J. Adv. Soft Comput. Appl., V5, P1
[10]  
Boln-Canedo V., 2015, Feature Selection for High-Dimensional Data