Classification with noisy labels through tree-based models and semi-supervised learning: A case study of lithology identification

被引:15
作者
Zhu, Xinyi [1 ]
Zhang, Hongbing [1 ]
Zhu, Rui [2 ]
Ren, Quan [1 ]
Zhang, Lingyuan [1 ]
机构
[1] Hohai Univ, Sch Earth Sci & Engn, Nanjing 211100, Peoples R China
[2] Nanjing Agr Univ, Coll Food Sci & Technol, Nanjing 211100, Peoples R China
基金
中国国家自然科学基金;
关键词
Isolation forest; Semi -supervised learning; Tri-training; Learning with noisy labels; Lithology identification; FOREST;
D O I
10.1016/j.eswa.2023.122506
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Lithology identification is a crucial task for reservoir characterization and evaluation. There exists an intricate non-linear response between formation lithology and logging data. However, it is difficult to avoid lithology mislabeling due to human error and interpretation coarsening, and label quality can seriously affect the effectiveness of supervised learning. The presence of noisy labels makes it essential to learn with noisy labels. Noisefiltering methods and noise-robust algorithms only concentrate on a singular aspect of data or algorithm. In this paper, hybrid noise label filtering and correction framework for lithology identification (HNFCL) is proposed. Isolation forest is utilized to detect suspicious data, as it is efficient and fast. Baseline classifiers are built by ensemble tree models. In particular, the labels of abnormal data are removed and Tri-training semi-supervised method is introduced to relabel these data, which minimizes the loss of valid training data. Comprehensive experiments of the HNFCL framework, noise filtering methods and deep neural network methods with optimized loss functions were carried out in the industrial application of logging lithology identification. HNFCL achieved average accuracy of 87.94% and 94.93% in two study wells. These results outperformed the noise filtering methods and showed no significant difference from the state-of-the-art method. The correction of noise by HNFCL will provide a prospect for lithology identification applications.
引用
收藏
页数:13
相关论文
共 45 条
[1]  
Barz B, 2020, IEEE WINT CONF APPL, P1360, DOI 10.1109/WACV45572.2020.9093286
[2]   A generalised label noise model for classification in the presence of annotation errors [J].
Bootkrajang, Jakramate .
NEUROCOMPUTING, 2016, 192 :61-71
[3]   LongReMix: Robust learning with high confidence samples in a noisy label environment [J].
Cordeiro, Filipe R. ;
Sachdeva, Ragav ;
Belagiannis, Vasileios ;
Reid, Ian ;
Carneiro, Gustavo .
PATTERN RECOGNITION, 2023, 133
[4]   Well log data analysis for lithology and fluid identification in Krishna-Godavari Basin, India [J].
Das, Baisakhi ;
Chatterjee, Rima .
ARABIAN JOURNAL OF GEOSCIENCES, 2018, 11 (10)
[5]   A comparative evaluation of outlier detection algorithms: Experiments and analyses [J].
Domingues, Remi ;
Filippone, Maurizio ;
Michiardi, Pietro ;
Zouaoui, Jihane .
PATTERN RECOGNITION, 2018, 74 :406-421
[6]   Failure mode classification and bearing capacity prediction for reinforced concrete columns based on ensemble machine learning algorithm [J].
Feng, De-Cheng ;
Liu, Zhen-Tao ;
Wang, Xiao-Dan ;
Jiang, Zhong-Ming ;
Liang, Shi-Xue .
ADVANCED ENGINEERING INFORMATICS, 2020, 45
[7]   Classification in the Presence of Label Noise: a Survey [J].
Frenay, Benoit ;
Verleysen, Michel .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2014, 25 (05) :845-869
[8]   Ensembles of label noise filters: a ranking approach [J].
Garcia, Luis P. F. ;
Lorena, Ana C. ;
Matwin, Stan ;
de Carvalho, Andre C. P. L. F. .
DATA MINING AND KNOWLEDGE DISCOVERY, 2016, 30 (05) :1192-1216
[9]  
Ghosh A, 2017, Arxiv, DOI arXiv:1712.09482
[10]   A Robust Regularization Path Algorithm for ν-Support Vector Classification [J].
Gu, Bin ;
Sheng, Victor S. .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (05) :1241-1248