Interpretable machine learning models for failure cause prediction in imbalanced oil pipeline data

被引:2
作者
Awuku, Bright [1 ]
Huang, Ying [1 ]
Yodo, Nita [1 ]
Asa, Eric [1 ]
机构
[1] North Dakota State Univ, Dept Civil Construct & Environm Engn, Fargo, ND 58102 USA
基金
美国国家科学基金会;
关键词
energy; oil; machine learning; deep learning; interpretability; pipeline; failure;
D O I
10.1088/1361-6501/ad3570
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Pipelines are critical arteries in the oil and gas industry and require massive capital investment to safely construct networks that transport hydrocarbons across diverse environments. However, these pipeline systems are prone to integrity failure, which results in significant economic losses and environmental damage. Accurate prediction of pipeline failure events using historical oil pipeline accident data enables asset managers to plan sufficient maintenance, rehabilitation, and repair activities to prevent catastrophic failures. However, learning the complex interdependencies between pipeline attributes and rare failure events presents several analytical challenges. This study proposes a novel machine learning (ML) framework to accurately predict pipeline failure causes on highly class-imbalanced data compiled by the United States Pipeline and Hazardous Materials Safety Administration. Natural language processing techniques were leveraged to extract informative features from unstructured text data. Furthermore, class imbalance in the dataset was addressed via oversampling and intrinsic cost-sensitive learning (CSL) strategies adapted for the multi-class case. Nine machine and deep learning architectures were benchmarked, with LightGBM demonstrating superior performance. The integration of CSL yielded an 86% F1 score and a 0.82 Cohen kappa score, significantly advancing prior research. This study leveraged a comprehensive Shapley Additive explanation analysis to interpret the predictions from the LightGBM algorithm, revealing the key factors driving failure probabilities. Leveraging sentiment analysis allowed the models to capture a richer, more multifaceted representation of the textual data. This study developed a novel CSL approach that integrates domain knowledge regarding the varying cost impacts of misclassifying different failure types into ML models. This research demonstrated an effective fusion of text insights from inspection reports with structured pipeline data that enhances model interpretability. The resulting AI modeling framework generated data-driven predictions of the causes of failure that could enable transportation agencies with actionable insights. These insights enable tailored preventative maintenance decisions to proactively mitigate emerging pipeline failures.
引用
收藏
页数:18
相关论文
共 53 条
  • [21] He H, 2013, IMBALANCED LEARNING: FOUNDATIONS, ALGORITHMS, AND APPLICATIONS, P1, DOI 10.1002/9781118646106
  • [22] Japkowicz N., 2002, Intelligent Data Analysis, V6, P429
  • [23] Ke GL, 2017, ADV NEUR IN, V30
  • [24] A unified causation prediction model for aboveground onshore oil and refined product pipeline incidents using artificial neural network
    Kumari, Pallavi
    Wang, Qingsheng
    Khan, Faisal
    Kwon, Joseph Sang-Il
    [J]. CHEMICAL ENGINEERING RESEARCH & DESIGN, 2022, 187 : 529 - 540
  • [25] Feature Selection: A Data Perspective
    Li, Jundong
    Cheng, Kewei
    Wang, Suhang
    Morstatter, Fred
    Trevino, Robert P.
    Tang, Jiliang
    Liu, Huan
    [J]. ACM COMPUTING SURVEYS, 2018, 50 (06)
  • [26] Deep Learning-Based Analytics of Multisource Heterogeneous Bridge Data for Enhanced Data-Driven Bridge Deterioration Prediction
    Liu, Kaijian
    El-Gohary, Nora
    [J]. JOURNAL OF COMPUTING IN CIVIL ENGINEERING, 2022, 36 (05)
  • [27] Predicting Water Pipe Failures Using Deep Learning Algorithms
    Liu, Wei
    Xie, Zhiyin
    Song, Zhaoyang
    [J]. JOURNAL OF INFRASTRUCTURE SYSTEMS, 2023, 29 (03)
  • [28] Lundberg SM, 2017, ADV NEUR IN, V30
  • [29] Visualizing Classification Results: Confusion Star and Confusion Gear
    Luque, Amalia
    Mazzoleni, Mirko
    Carrasco, Alejandro
    Ferramosca, Antonio
    [J]. IEEE ACCESS, 2022, 10 : 1659 - 1677
  • [30] Maloof MA., 2003, ICML-2003 Workshop on Learning from Imbalanced Data Sets II