Variable selection and data fusion for diesel cetane number prediction

被引:3
作者
Buendia-Garcia, J. [1 ,3 ]
Lacoue-Negre, M. [1 ,3 ]
Gornay, J. [1 ]
Mas-Garcia, S. [2 ,3 ]
Bendoula, R. [2 ,3 ]
Roger, J. M. [2 ,3 ]
机构
[1] IFP Energies Nouvelles, Solaize, France
[2] Univ Montpellier, Inst Agro, ITAP, INRAE, Montpellier, France
[3] ChemHouse Res Grp, Montpellier, France
关键词
Variable selection; Near-Infrared (NIR); Process variables; Data fusion; Hydrocracking; Diesel fuel; Cetane number; MULTIVARIATE CALIBRATION; SPECTROSCOPY; ALGORITHM; MODEL;
D O I
10.1016/j.fuel.2022.126297
中图分类号
TE [石油、天然气工业]; TK [能源与动力工程];
学科分类号
0807 ; 0820 ;
摘要
This study evaluates the potential of variable selection to improve the performance of data fusion modelling to estimate diesel cetane number from NIR spectroscopy information acquired on total effluent samples obtained from the hydrocracking process and their operating variables. The evaluation conducted in this research was divided into four steps. First, predictive models were developed using each data block separately. Next, seven variable selection methods were applied on the NIR block, and eleven methods were applied on the process variable block. Then, with each data set generated from the variable selection analysis, single prediction models were generated and compared with those developed in the first step. Finally, data fusion was performed once the best variable selection method was defined for each data block. Two data fusion models were generated, a first using all the variables in the two blocks and a second using only the previously selected variables. In addition, the potential of the sequential and orthogonalized covariance selection (SO-CovSel) method was also analyzed. The results showed that the data fusion modelling using all variables from each data block improves the estimation of the diesel cetane number compared to single models (about 20% reduction of the RMSEP). However, using variable selection analysis before data fusion significantly improves the estimation of this property and leads to greater model stability regarding the RMSE's and r's (about 47% of the RMSEP). The Covariance Selection (CovSel) method was the most efficient in the NIR data block, while for the process variable data block, it was the sequential backward floating feature selection method (SBFFS) that gave the best performance. The advantages offered by the variable selection resulted not only in having a more accurate prediction of the property but also in improving the analysis and understanding of the process by determining the variables that significantly impact the property studied.
引用
收藏
页数:12
相关论文
共 54 条
[1]   Monitoring a complex refining process using multivariate statistics [J].
AlGhazzawi, Ashraf ;
Lennox, Barry .
CONTROL ENGINEERING PRACTICE, 2008, 16 (03) :294-307
[2]   Variable selection in regression-a tutorial [J].
Andersen, C. M. ;
Bro, R. .
JOURNAL OF CHEMOMETRICS, 2010, 24 (11-12) :728-737
[3]   A review of recent variable selection methods in industrial and chemometrics applications [J].
Anzanello, Michel Jose ;
Fogliatto, Flavio Sanson .
EUROPEAN JOURNAL OF INDUSTRIAL ENGINEERING, 2014, 8 (05) :619-645
[4]  
ASTM, 2000, ASTM D 3238 - 95
[5]  
ASTM, ASTM D1218 - 12
[6]  
ASTM, 2007, ASTM D5291
[7]  
ASTM, ASTM D445-97
[8]  
ASTM, 2015, ASTM D 7213-15
[9]  
Ballabio D., 2019, Data Handl. Sci. Technol, V31, P129, DOI [DOI 10.1016/B978-0-444-64179-3.00005-0, DOI 10.1016/B978-0-444-63984-4.00005-3]
[10]   SO-CovSel: A novel method for variable selection in a multiblock framework [J].
Biancolillo, Alessandra ;
Marini, Federico ;
Roger, Jean-Michel .
JOURNAL OF CHEMOMETRICS, 2020, 34 (02)