Augmented Industrial Data-Driven Modeling Under the Curse of Dimensionality

被引：20

作者：

Jiang, Xiaoyu ^{[1
,2
]}

Kong, Xiangyin ^{[1
,2
]}

Ge, Zhiqiang ^{[1
,2
]}

机构：

[1] Zhejiang Univ, Coll Control Sci & Engn, State Key Lab Ind Control Technol, Hangzhou 310027, Peoples R China

[2] Peng Cheng Lab, Shenzhen 518000, Peoples R China

来源：

IEEE-CAA JOURNAL OF AUTOMATICA SINICA | 2023年 / 10卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Curse of dimensionality; data augmentation; data-driven modeling; industrial processes; machine learning; FAULT-DIAGNOSIS; KERNEL;

D O I：

10.1109/JAS.2023.123396

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The curse of dimensionality refers to the problem of increased sparsity and computational complexity when dealing with high-dimensional data. In recent years, the types and variables of industrial data have increased significantly, making data-driven models more challenging to develop. To address this problem, data augmentation technology has been introduced as an effective tool to solve the sparsity problem of high-dimensional industrial data. This paper systematically explores and discusses the necessity, feasibility, and effectiveness of augmented industrial data-driven modeling in the context of the curse of dimensionality and virtual big data. Then, the process of data augmentation modeling is analyzed, and the concept of data boosting augmentation is proposed. The data boosting augmentation involves designing the reliability weight and actual-virtual weight functions, and developing a double weighted partial least squares model to optimize the three stages of data generation, data fusion, and modeling. This approach significantly improves the inter-pretability, effectiveness, and practicality of data augmentation in the industrial modeling. Finally, the proposed method is verified using practical examples of fault diagnosis systems and virtual measurement systems in the industry. The results demonstrate the effectiveness of the proposed approach in improving the accuracy and robustness of data-driven models, making them more suitable for real-world industrial applications.

引用

页码：1445 / 1461

页数：17

共 49 条

[1] The curse(s) of dimensionality
Altman, Naomi
Krzywinski, Martin
[J]. NATURE METHODS, 2018, 15 (06) : 399 - 400
[2] AN INTRODUCTION TO KERNEL AND NEAREST-NEIGHBOR NONPARAMETRIC REGRESSION
ALTMAN, NS
[J]. AMERICAN STATISTICIAN, 1992, 46 (03) : 175 - 185
[3] Balasubramanian M, 2002, SCIENCE, V295
[4] LEARNABILITY AND THE VAPNIK-CHERVONENKIS DIMENSION
BLUMER, A
EHRENFEUCHT, A
HAUSSLER, D
WARMUTH, MK
[J]. JOURNAL OF THE ACM, 1989, 36 (04) : 929 - 965
[5] Bühlmann P, 2011, SPRINGER SER STAT, P1, DOI 10.1007/978-3-642-20192-9
[6] SMOTE: Synthetic minority over-sampling technique
Chawla, Nitesh V.
Bowyer, Kevin W.
Hall, Lawrence O.
Kegelmeyer, W. Philip
[J]. 2002, American Association for Artificial Intelligence (16)
[7] Cho JH, 2016, Arxiv, DOI arXiv:1511.06348
[8] A PLANT-WIDE INDUSTRIAL-PROCESS CONTROL PROBLEM
DOWNS, JJ
VOGEL, EF
[J]. COMPUTERS & CHEMICAL ENGINEERING, 1993, 17 (03) : 245 - 255
[9] Minimizing Convolutional Neural Network Training Data With Proper Data Augmentation for Inline Defect Classification
Fujishiro, Akihiro
Nagamura, Yoshikazu
Usami, Tatsuya
Inoue, Masao
[J]. IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, 2021, 34 (03) : 333 - 339
[10] A comparative study of just-in-time-learning based methods for online soft sensor modeling
Ge, Zhiqiang
Song, Zhihuan
[J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2010, 104 (02) : 306 - 317

← 1 2 3 4 5 →