Genetic programming-based symbolic regression for goal-oriented dimension reduction

被引:2
作者
Dorgo, Gyula [1 ]
Kulcsar, Tibor [1 ,2 ]
Abonyi, Janos [1 ]
机构
[1] Univ Pannonia, MTA PE Lendulet Complex Syst Monitoring Res Grp, H-8201 Veszprem, Hungary
[2] MAVOCO AG, A-7000 Eisenstadt, Austria
关键词
Data visualisation; Software sensor; Online near-infrared-spectroscopy; Classification; Genetic programming; Principal component analysis; VISUALIZATION;
D O I
10.1016/j.ces.2021.116769
中图分类号
TQ [化学工业];
学科分类号
0817 ;
摘要
The majority of dimension reduction techniques are built upon the optimization of an objective function aiming to retain certain characteristics of the projected datapoints: the variance of the original dataset, the distance between the datapoints or their neighbourhood characteristics, etc. Building upon the optimization-based formalization of dimension reduction techniques, the goal-oriented formulation of projection cost functions is proposed. For the optimization of the application-oriented data visualization cost function, a Multi-gene genetic programming (GP)-based algorithm is introduced to optimize the structures of the equations used for mapping high-dimensional data into a two-dimensional space and to select variables that are needed to explore the internal structure of the data for data-driven software sensor development or classifier design. The main benefit of the approach is that the evolved equations are interpretable and can be utilized in surrogate models. The applicability of the approach is demon-strated in the benchmark wine dataset and in the estimation of the product quality in a diesel oil blending technology based on an online near-infrared (NIR) analyzer. The results illustrate that the algorithm is capable to generate goal-oriented and interpretable features, and the resultant simple algebraic equa-tions can be directly implemented into applications when there is a need for computationally cost-effective projections of high-dimensional data as the resultant algebraic equations are computationally simpler than other solutions as neural networks. (c) 2021 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
引用
收藏
页数:12
相关论文
共 23 条
[1]   Genetic Programming Methods for Reinforcement Learning [J].
Babuska, Robert .
PROCEEDINGS OF THE 2019 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'19), 2019, :2-2
[2]  
Banzhaf W., 2019, GENETIC PROGRAMMING
[3]  
Bengio Y, 2004, ADV NEUR IN, V16, P177
[4]   Evolutionary Visual Exploration: Evaluation of an IEC Framework for Guided Visual Search [J].
Boukhelifa, N. ;
Bezerianos, A. ;
Cancino, W. ;
Lutton, E. .
EVOLUTIONARY COMPUTATION, 2017, 25 (01) :55-86
[5]   Visualization of process data by use of evolutionary computation [J].
Chemaly, TP ;
Aldrich, C .
COMPUTERS & CHEMICAL ENGINEERING, 2001, 25 (9-10) :1341-1349
[6]   Application of machine learning methods to understand and predict circulating fluidized bed riser flow characteristics [J].
Chew, Jia Wei ;
Cocco, Ray A. .
CHEMICAL ENGINEERING SCIENCE, 2020, 217
[7]  
Descales B., 2000, Method for determining properties using near infra-red (nir) spectroscopy, Patent No. [uS6.070.128, 6070128]
[8]  
Ferreira D. C., 2019, P INT JOINT C NEUR N, P1
[9]   Fault detection of uncertain nonlinear process using interval-valued data-driven approach [J].
Harkat, M. -F. ;
Mansouri, M. ;
Nounou, M. ;
Nounou, H. .
CHEMICAL ENGINEERING SCIENCE, 2019, 205 :36-45
[10]   Reducing the dimensionality of data with neural networks [J].
Hinton, G. E. ;
Salakhutdinov, R. R. .
SCIENCE, 2006, 313 (5786) :504-507