A novel graph-based missing values imputation method for industrial lubricant data

被引:12
作者
Jeong, Soohwan [1 ]
Joo, Chonghyo [2 ,3 ]
Lim, Jongkoo [4 ]
Cho, Hyungtae [2 ]
Lim, Sungsu [1 ]
Kim, Junghwan [3 ]
机构
[1] Chungnam Natl Univ, Dept Comp Sci Engn, 99 Daehak Ro, Daejeon, South Korea
[2] Korea Inst Ind Technol, Green Mat & Proc R&D Grp, Ulsan 44413, South Korea
[3] Yonsei Univ, Dept Chem & Biomol Engn, Seoul 03722, South Korea
[4] GS Caltex Corp, Res & Dev Ctr, 359 Expo Ro, Daejeon 34122, South Korea
基金
新加坡国家研究基金会;
关键词
Lubricant formulation; Missing values; Imputation method; Graph Convolutional Network; PRINCIPAL COMPONENT ANALYSIS; INFORMATION;
D O I
10.1016/j.compind.2023.103937
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Missing values are unavoidable in lubricant formulation data in the chemical industry owing to the complexity of lubricant manufacturing. Therefore, imputing missing values using statistical analysis and data mining is essential to obtain meaningful information such as correlations and patterns. Traditional methods, such as random forest (RF), k-nearest neighbors (k-NN), support vector machine (SVM), and deep neural networks (DNNs), have been employed for imputing missing values. However, these traditional methods neglect the latent structure because they only consider the feature information of the data. To this end, this study proposed a novel graph-based imputation method (GBIM) considering the feature information and the relations between data points to improve model performance. The proposed GBIM expresses the relation between each data point via a graph by consulting with dependency modeling and imputes missing values using a graph convolutional network (GCN). Experiments were performed for four physical properties in a lubricant formulation dataset. The results using GBIM were compared with those of traditional imputation methods (RF, k-NN, SVM, and DNN) by considering missing rates at 5% intervals from 5% to 50%. GBIM achieved 4-7% higher imputation accuracy than the other methods. The proposed GBIM can be applied in various industries as a powerful method for imputing missing values.
引用
收藏
页数:12
相关论文
共 52 条
[1]   Energy-efficient edge based real-time healthcare support system [J].
Abirami, S. ;
Chitra, P. .
DIGITAL TWIN PARADIGM FOR SMARTER SYSTEMS AND ENVIRONMENTS: THE INDUSTRY USE CASES, 2020, 117 :339-368
[2]   Machine learning-based imputation soft computing approach for large missing scale and non-reference data imputation [J].
Alamoodi, A. H. ;
Zaidan, B. B. ;
Zaidan, A. . A. . ;
Albahri, O. S. ;
Chen, Juliana ;
Chyad, M. A. ;
Garfan, Salem ;
Aleesa, A. M. .
CHAOS SOLITONS & FRACTALS, 2021, 151
[3]   Missing data imputation using fuzzy-rough methods [J].
Amiri, Mehran ;
Jensen, Richard .
NEUROCOMPUTING, 2016, 205 :152-164
[4]  
[Anonymous], 2011, P 20 INT C WORLD WID
[5]  
Banerjee A., 2008, CLASSIFICATION REGRE
[6]  
Boquet G, 2019, INT CONF ACOUST SPEE, P2882, DOI [10.1109/ICASSP.2019.8683011, 10.1109/icassp.2019.8683011]
[7]  
Cassidy AP, 2014, IEEE INT CONF BIG DA
[8]  
Chen Z., 2019, 7 INT C LEARN REPRES
[9]   A novel clustering-based purity and distance imputation for handling medical data with missing values [J].
Cheng, Ching-Hsue ;
Huang, Shu-Fen .
SOFT COMPUTING, 2021, 25 (17) :11781-11801
[10]   A novel weighted distance threshold method for handling medical missing values [J].
Cheng, Ching-Hsue ;
Chang, Jing-Rong ;
Huang, Hao-Hsuan .
COMPUTERS IN BIOLOGY AND MEDICINE, 2020, 122 (122)