How to improve machine learning models for lithofacies identification by practical and novel ensemble strategy and principles

被引:22
作者
Dong, Shao-Qun [1 ,2 ]
Sun, Yan-Ming [1 ,2 ]
Xu, Tao [1 ,2 ]
Zeng, Lian-Bo [1 ,3 ]
Du, Xiang-Yi [1 ,3 ]
Yang, Xu [1 ,2 ]
Liang, Yu [1 ,2 ]
机构
[1] China Univ Petr, State Key Lab Petr Resources & Prospecting, Beijing 102249, Peoples R China
[2] China Univ Petr, Coll Sci, Beijing 102249, Peoples R China
[3] China Univ Petr, Coll Geosci, Beijing 102249, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Lithofacies identification; Machine learning; Ensemble learning strategy; Ensemble principle; Homogeneous ensemble; Heterogeneous ensemble; LITHOLOGY IDENTIFICATION; DISCRIMINANT-ANALYSIS; PREDICTION; FACIES; BASIN; FIELD; ZONE;
D O I
10.1016/j.petsci.2022.09.006
中图分类号
TE [石油、天然气工业]; TK [能源与动力工程];
学科分类号
0807 ; 0820 ;
摘要
Typically, relationship between well logs and lithofacies is complex, which leads to low accuracy of lithofacies identification. Machine learning (ML) methods are often applied to identify lithofacies using logs labelled by rock cores. However, these methods have accuracy limits to some extent. To further improve their accuracies, practical and novel ensemble learning strategy and principles are proposed in this work, which allows geologists not familiar with ML to establish a good ML lithofacies identification model and help geologists familiar with ML further improve accuracy of lithofacies identification. The ensemble learning strategy combines ML methods as sub-classifiers to generate a comprehensive lith-ofacies identification model, which aims to reduce the variance errors in prediction. Each sub-classifier is trained by randomly sampled labelled data with random features. The novelty of this work lies in the ensemble principles making sub-classifiers just overfitting by algorithm parameter setting and sub-dataset sampling. The principles can help reduce the bias errors in the prediction. Two issues are dis-cussed, videlicet (1) whether only a relatively simple single-classifier method can be as sub-classifiers and how to select proper ML methods as sub-classifiers; (2) whether different kinds of ML methods can be combined as sub-classifiers. If yes, how to determine a proper combination. In order to test the effectiveness of the ensemble strategy and principles for lithofacies identification, different kinds of machine learning algorithms are selected as sub-classifiers, including regular classifiers (LDA, NB, KNN, ID3 tree and CART), kernel method (SVM), and ensemble learning algorithms (RF, AdaBoost, XGBoost and LightGBM). In this work, the experiments used a published dataset of lithofacies from Daniudi gas field (DGF) in Ordes Basin, China. Based on a series of comparisons between ML algorithms and their corresponding ensemble models using the ensemble strategy and principles, conclusions are drawn: (1) not only decision tree but also other single-classifiers and ensemble-learning-classifiers can be used as sub-classifiers of homogeneous ensemble learning and the ensemble can improve the accuracy of the original classifiers; (2) the ensemble principles for the introduced homogeneous and heterogeneous ensemble strategy are effective in promoting ML in lithofacies identification; (3) in practice, heterogeneous ensemble is more suitable for building a more powerful lithofacies identification model, though it is complex.
引用
收藏
页码:733 / 752
页数:20
相关论文
共 65 条
[31]   Predicting the thermal conductivity of soils using integrated approach of ANN and PSO with adaptive and time-varying acceleration coefficients [J].
Kardani, Navid ;
Bardhan, Abidhan ;
Samui, Pijush ;
Nazem, Majidreza ;
Asteris, Panagiotis G. ;
Zhou, Annan .
INTERNATIONAL JOURNAL OF THERMAL SCIENCES, 2022, 173
[32]   Prediction of military combat clothing size using decision trees and 3D body scan data [J].
Kolose, Stephven ;
Stewart, Tom ;
Hume, Patria ;
Tomkinson, Grant R. .
APPLIED ERGONOMICS, 2021, 95
[33]   Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy [J].
Kuncheva, LI ;
Whitaker, CJ .
MACHINE LEARNING, 2003, 51 (02) :181-207
[34]   Predictions of elemental composition of coal and biomass from their proximate analyses using ANFIS, ANN and MLR [J].
Lawal, Abiodun Ismail ;
Aladejare, Adeyemi Emman ;
Onifade, Moshood ;
Bada, Samson ;
Idris, Musa Adebayo .
INTERNATIONAL JOURNAL OF COAL SCIENCE & TECHNOLOGY, 2021, 8 (01) :124-140
[35]   JPEG Steganalysis With High-Dimensional Features and Bayesian Ensemble Classifier [J].
Li, Fengyong ;
Zhang, Xinpeng ;
Chen, Bin ;
Feng, Guorui .
IEEE SIGNAL PROCESSING LETTERS, 2013, 20 (03) :233-236
[36]   Identification of the Quaternary low gas-saturation reservoirs in the Sanhu area of the Qaidam Basin, China [J].
Li Xiongyan ;
Li Hongqi ;
Zhou Jinyu ;
He Xu ;
Chen Yihan ;
Yu Hongyan .
PETROLEUM SCIENCE, 2011, 8 (01) :49-54
[37]   Facies identification from well logs: A comparison of discriminant analysis and naive Bayes classifier [J].
Li, Yumei ;
Anderson-Sprecher, Richard .
JOURNAL OF PETROLEUM SCIENCE AND ENGINEERING, 2006, 53 (3-4) :149-157
[38]  
Liu R., 2018, Adv Geo-Energy Res, V2, P113, DOI [10.26804/ager.2018.02.01, DOI 10.26804/AGER.2018.02.01]
[39]  
Liu S.S., 2022, PETROLUEM SCI B, V7, P93, DOI [10.3969/j.issn.2096-1693.2022.01.009, DOI 10.3969/J.ISSN.2096-1693.2022.01.009]
[40]   Lithofacies identification using support vector machine based on local deep multi-kernel learning [J].
Liu, Xing-Ye ;
Zhou, Lin ;
Chen, Xiao-Hong ;
Li, Jing-Ye .
PETROLEUM SCIENCE, 2020, 17 (04) :954-966