A missing power data filling method based on improved random forest algorithm

被引:24
作者
Deng W. [1 ]
Guo Y. [2 ]
Liu J. [2 ]
Li Y. [2 ]
Liu D. [3 ]
Zhu L. [3 ]
机构
[1] State Grid Hunan Power Company Limited Research Institute, Changsha
[2] College of Electrical and Information Engineering, Hunan University, Changsha
[3] State Grid Hunan Electric Power Company Limited, Changsha
来源
Chinese Journal of Electrical Engineering | 2019年 / 5卷 / 04期
关键词
Big data cleaning; Data preprocessing; Data quality; Missing data filling; Random forest;
D O I
10.23919/CJEE.2019.000025
中图分类号
学科分类号
摘要
Missing data filling is a key step in power big data preprocessing, which helps to improve the quality and the utilization of electric power data. Due to the limitations of the traditional methods of filling missing data, an improved random forest filling algorithm is proposed. As a result of the horizontal and vertical directions of the electric power data are based on the characteristics of time series. Therefore, the method of improved random forest filling missing data combines the methods of linear interpolation, matrix combination and matrix transposition to solve the problem of filling large amount of electric power missing data. The filling results show that the improved random forest filling algorithm is applicable to filling electric power data in various missing forms. What's more, the accuracy of the filling results is high and the stability of the model is strong, which is beneficial in improving the quality of electric power data. © 2021 Chinese Journal of Electrical Engineering. All rights reserved.
引用
收藏
页码:33 / 39
页数:6
相关论文
共 11 条
[1]  
Ma W., Gu Y., Li F., Sequence-sensitive multi-source sensing data filling technology, Journal of Software, pp. 2332-2347, (2016)
[2]  
Pigott T.D., A review of methods for missing data, Educ. Res. Eval, 7, 4, pp. 353-383, (2001)
[3]  
Wayman J.C., Multiple Imputation for Missing Data: What Is It and How Can I Use It, (2003)
[4]  
Wu S., Feng X., Dan Z., Missing data filling method based on incomplete data clustering, Journal of Computers, 35, 8, pp. 1726-1738, (2012)
[5]  
Yan Y., Sheng G., Qin S., A big data cleaning method for power transmission and transformation equipment status based on time series analysis, Automation of Electric Power Systems, 39, 7, pp. 138-144, (2015)
[6]  
Breiman L., Bagging predictors, Machine Learning, 24, 2, pp. 123-140, (1996)
[7]  
Breiman L., Random forests, Machine Learning, 45, 1, pp. 5-32, (2001)
[8]  
Ho T.K., The random subspace method for constructing decision forests, IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 8, pp. 832-844, (1998)
[9]  
Wolpera D.H., MacReady W.G., An efficient method to estimate bagging's generalization error, Machine Learning, 35, 1, pp. 41-45, (1996)
[10]  
Agrawal R.K., Muchahary F., Tripathi M.M., Long term load forecasting with hourly predictions based on long-short-term-memory networks, IEEE Texas Power and Energy Conference, (2018)