Evaluation of machine learning methods for formation lithology identification: A comparison of tuning processes and model performances

被引:229
作者
Xie, Yunxin [1 ,2 ]
Zhu, Chenyang [3 ]
Zhou, Wen [2 ]
Li, Zhongdong [1 ,2 ]
Liu, Xuan [1 ,2 ]
Tu, Mei [4 ]
机构
[1] Chengdu Univ Technol, Coll Energy, Chengdu 610051, Sichuan, Peoples R China
[2] Chengdu Univ Technol, State Key Lab Oil & Gas Reservoir Geol & Exploita, Chengdu 610051, Sichuan, Peoples R China
[3] Univ Southampton, Dept Elect & Comp Sci, Southampton SO17 1BJ, Hants, England
[4] Sinopec Oilfield Serv Jianghan Corp, Logging Co, Qianjiang 433123, Peoples R China
关键词
Lithology identification; Supervised learning; Gradient boosting; Tuning parameter; TRAINING DATA; LOGS; PREDICTION; GAS;
D O I
10.1016/j.petrol.2017.10.028
中图分类号
TE [石油、天然气工业]; TK [能源与动力工程];
学科分类号
0807 ; 0820 ;
摘要
Identification of underground formation lithology from well log data is an important task in petroleum exploration and engineering. Recently, several computational algorithms have been used for lithology identification to improve the prediction accuracy. In this paper, we evaluate five typical machine learning methods, namely the Naive Bayes, Support Vector Machine, Artificial Neural Network, Random Forest and Gradient Tree Boosting, for formation lithology identification using data from the Daniudui gas field and the Hangjinqi gas field. The input to each model consists of features selected from different well log data samples. To determine the best model to classify the lithology type, this study used validation curve to determine the parameter search range and adopted the hyper-parameter optimization method to obtain the best parameter set for each model. The performance of each classifier is also evaluated using 5-fold cross validation. The results suggest that ensemble methods are good algorithm choices for supervised classification of lithology using well log data. The Gradient Tree Boosting classifier is robust to overfitting because it grows trees sequentially by adjusting the weight of the training data distribution to minimize a loss function. The random forest classifier is also a suitable option. An evaluation matrix showed that the Gradient Tree Boosting and Random Forest classifiers have lower prediction errors compared with the other three models. Although all the models have difficulties in distinguishing sandstone classes, the Gradient Tree Boosting performs well on this task compared with the other four methods. Moreover, the classification accuracy is remarkably similar across the lithology classes for both the Random Forest and Gradient Tree Boosting models.
引用
收藏
页码:182 / 193
页数:12
相关论文
共 41 条
[1]  
Aha D. W, 1998, COMPARING SIMPLIFICA, P19
[2]  
AKINYOKUN O C., 2009, The Pacific Journal of Science and Technology, V10, P507
[3]  
Al-Anazi A., 2010, Natural Resources Research, V19, P125, DOI [10.1007/s11053-010-9118-9, DOI 10.1007/S11053-010-9118-9]
[4]  
An, 2013, J COMPUT SCI CYBERN, V16, P59
[5]  
[Anonymous], 2017, MICR LIGHTGBM
[6]  
[Anonymous], J AM STAT ASS
[7]  
[Anonymous], 2010, Journal of Machine Learning Research, DOI DOI 10.5555/1756006.1859899
[8]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[9]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[10]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411