Medium- and Long-Term Precipitation Forecasting Method Based on Data Augmentation and Machine Learning Algorithms

被引:35
作者
Tang, Tiantian [1 ]
Jiao, Donglai [1 ]
Chen, Tao [2 ]
Gui, Guan [3 ]
机构
[1] Nanjing Univ Posts & Telecommun, Sch Geog & Biol Informat, Nanjing 210023, Peoples R China
[2] Nanjing Hydraul Res Inst, Hydrol & Water Resources Dept, Nanjing 210029, Peoples R China
[3] Nanjing Univ Posts & Telecommun, Coll Telecommun & Informat Engn, Nanjing 210003, Peoples R China
基金
中国博士后科学基金;
关键词
Forecasting; Clustering algorithms; Machine learning; Data models; Predictive models; Machine learning algorithms; Computational modeling; Extreme gradient boosting (XGB); K-means; long short-term memory (LSTM); machine learning (ML); medium- and long-term precipitation forecasting; random forest (RF); recurrent neural network (RNN); synthetic minority oversampling; SUPPORT VECTOR MACHINE; NEURAL-NETWORK; PART II; MODEL; CLIMATE; SYSTEM; FLOW; UNCERTAINTY; SURFACE; LSTM;
D O I
10.1109/JSTARS.2022.3140442
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Accurate medium and long-term precipitation forecasting plays a vital role in disaster prevention and mitigation and rational allocation of water resources. In recent years, there are various methods for medium- and long-term precipitation forecasting based on machine learning algorithms. However, machine learning has a high demand for the size of sample data. Therefore, this article proposes a data augmentation algorithm based on the K-means clustering algorithm and synthetic minority oversampling technique (SMOTE), which can effectively enhance sample information. Besides, through constructing random forest (RF), extreme gradient boosting (XGB), recurrent neural network (RNN), and long short-term memory (LSTM) are, respectively, constructed as the models to forecast monthly grid precipitation of the Danjiangkou River Basin. This study aims to improve the accuracy of medium- and long-term precipitation forecasting. The main results are the following two aspects: 1) in most years, the anomaly correlation coefficient and Pg score of SMOTE-km-XGB and SMOTE-km-RF exceed that of XGB and RF; furthermore, compared with the other three methods, SMOTE-km-XGB method is more suitable for precipitation forecasting in the studied basin in this article; and 2) the forecasting results of two deep learning methods (RNN and LSTM) show that the sample data processed by the K-means clustering algorithm and SMOTE data augmentation algorithm have not achieved considerable results in deep learning. This study improves the accuracy of precipitation forecast by expanding and balancing the information of sample data, and provides a new research idea for improving the accuracy of medium- and long-term hydrological forecasting.
引用
收藏
页码:1000 / 1011
页数:12
相关论文
共 53 条
[1]   Evolutionary computational intelligence algorithm coupled with self-tuning predictive model for water quality index determination [J].
Abba, S., I ;
Hadi, Sinan Jasim ;
Sammen, Saad Sh ;
Salih, Sinan Q. ;
Abdulkadir, R. A. ;
Quoc Bao Pham ;
Yaseen, Zaher Mundher .
JOURNAL OF HYDROLOGY, 2020, 587
[2]  
Abdulaziz Yousra, 2010, Proceedings of the 2010 International Conference on Information Retrieval and Knowledge Management (CAMP 2010), P260, DOI 10.1109/INFRKM.2010.5466907
[3]  
[Anonymous], 2015, 2015 3 WORLD C COMPL
[4]   LEARNING LONG-TERM DEPENDENCIES WITH GRADIENT DESCENT IS DIFFICULT [J].
BENGIO, Y ;
SIMARD, P ;
FRASCONI, P .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (02) :157-166
[5]   Atmospheric Infrared Sounder (AIRS) sounding evaluation and analysis of the pre-convective environment [J].
Botes, Danelle ;
Mecikalski, John R. ;
Jedlovec, Gary J. .
JOURNAL OF GEOPHYSICAL RESEARCH-ATMOSPHERES, 2012, 117
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]  
Cai, 2020, ANN NUCL ENERGY, V150
[9]   Self-supervised data augmentation for person re-identification [J].
Chen, Feng ;
Wang, Nian ;
Tang, Jun ;
Liang, Dong ;
Feng, Hao .
NEUROCOMPUTING, 2020, 415 :48-59
[10]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794