Data augmentation using SMOTE technique: Application for prediction of burst pressure of hydrocarbons pipeline using supervised machine learning models

被引:0
|
作者
Soomro, Afzal Ahmed [1 ,2 ]
Mokhtar, Ainul Akmar [2 ]
Muhammad, Masdi B. [2 ]
Saad, Mohamad Hanif Md [1 ]
Lashari, Najeebullah [3 ,4 ]
Hussain, Muhammad [5 ]
Palli, Abdul Sattar [6 ]
机构
[1] Univ Kebangsaan Malaysia, Dept Mech & Mfg Engn, Bangi 43600, Selangor, Malaysia
[2] Univ Teknol PETRONAS, Mech Engn Dept, Seri Iskandar 32610, Perak Darul, Malaysia
[3] Dawood Univ Engn & Technol, Petr & Gas Engn Dept, MA Jinnah Rd, Karachi 74800, Pakistan
[4] Univ Teknol PETRONAS, Petr Engn Dept, Seri Iskandar 32610, Perak Darul, Malaysia
[5] Univ Wollongong, Northfields Ave Wollongong, Wollongong, NSW 2522, Australia
[6] Univ Teknol PETRONAS, Comp & Informat Sci Dept, Seri Iskandar 32610, Perak Darul, Malaysia
关键词
Burst pressure prediction; Machine learning; SMOTE; Data augmentation; Oil and gas pipelines; Safety; FAILURE PRESSURE; CORROSION DEFECTS; CHARGE; STEEL; STATE;
D O I
10.1016/j.rineng.2024.103233
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Accurate burst pressure prediction is critical for ensuring oil and gas pipeline safety, guiding maintenance decisions, and lowering costs and risks. Traditional methods have limitations, including high experimental costs, conservative empirical models, and computationally expensive numerical algorithms. Machine learning (ML) models have supplanted traditional methods in recent years. However, small and imbalanced datasets are the big challenge to build a ML model that can generate more accurate results. Moreover, the lack of generalization in ML models trained on a dataset of pipelines with specific material grids prevents them from producing superior results on other pipeline types. First, FEA was used to make a dataset. Then, a new way to improve machine learning (ML) model generalization for burst pressure prediction is suggested: combine publicly available datasets of different pipeline specifications. In this combined dataset, some pipelines have a higher number of data samples, and some have fewer, which causes a class imbalance issue. The Synthetic Minority Oversampling Technique (SMOTE) technique was applied to address the issue of class imbalance. The performance of various ML models, Extra Trees (ET), Extreme Gradient Boosting (XGBR), Random Forest (RF), Light Gradient Boosting Machine (LGBM), and Decision Tree (DT), was evaluated to validate the model's prediction and generalization on pipelines of various material grids. Results show that all the selected ML models produced high R-squared, i.e., >0.95, on balanced data compared to the imbalance dataset. These results show that SMOTE-based augmentation is a beneficial way to fix dataset imbalance and make ML models better at predicting burst pressure in oil and gas pipelines.
引用
收藏
页数:17
相关论文
共 50 条
  • [31] Systematic evaluation of supervised machine learning for sample origin prediction using metagenomic sequencing data
    Chen, Julie Chih-yu
    Tyler, Andrea D.
    BIOLOGY DIRECT, 2020, 15 (01)
  • [32] ML-GAP: machine learning-enhanced genomic analysis pipeline using autoencoders and data augmentation
    Agraz, Melih
    Goksuluk, Dincer
    Zhang, Peng
    Choi, Bum-Rak
    Clements, Richard T.
    Choudhary, Gaurav
    Karniadakis, George Em
    FRONTIERS IN GENETICS, 2024, 15
  • [33] Machine Learning Based Flashover Prediction Models Using Synthetic Data and Fire ImagesMachine Learning Based Flashover Prediction Models Using Synthetic Data and Fire Images
    Yansheng Song
    Guang Xiao
    Haoran Wang
    Fire Technology, 2025, 61 (4) : 2389 - 2413
  • [34] Realistic SAR Data Augmentation using Machine Learning Techniques
    Lewis, Benjamin
    DeGuchy, Omar
    Sebastian, Joseph
    Kaminski, John
    ALGORITHMS FOR SYNTHETIC APERTURE RADAR IMAGERY XXVI, 2019, 10987
  • [35] The Prediction of Workplace Turnover Using Machine Learning Technique
    Choi, Youngkeun
    Choi, Jae Won
    INTERNATIONAL JOURNAL OF BUSINESS ANALYTICS, 2021, 8 (04) : 1 - 10
  • [36] SMOTE-Based Automated PCOS Prediction Using Lightweight Deep Learning Models
    Ahmad, Rumman
    Maghrabi, Lamees A.
    Khaja, Ishfaq Ahmad
    Maghrabi, Louai A.
    Ahmad, Musheer
    DIAGNOSTICS, 2024, 14 (19)
  • [37] Machine learning for helicopter accident analysis using supervised classification: Inference, prediction, and implications
    Xu, Zhaoyi
    Saleh, Joseph Homer
    Subagia, Rachmat
    RELIABILITY ENGINEERING & SYSTEM SAFETY, 2020, 204
  • [38] Wireless Positioning Using Deep Learning with Data Augmentation Technique
    Tian, Kegang
    Song, Shijie
    Xu, Wenbo
    Li, Dong
    Yang, Kun
    2021 IEEE 32ND ANNUAL INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR AND MOBILE RADIO COMMUNICATIONS (PIMRC), 2021,
  • [39] Interpretable machine learning models for failure cause prediction in imbalanced oil pipeline data
    Awuku, Bright
    Huang, Ying
    Yodo, Nita
    Asa, Eric
    MEASUREMENT SCIENCE AND TECHNOLOGY, 2024, 35 (07)
  • [40] Systematic Review for Risks of Pressure Injury and Prediction Models Using Machine Learning Algorithms
    Barghouthi, Eba'a Dasan
    Owda, Amani Yousef
    Asia, Mohammad
    Owda, Majdi
    DIAGNOSTICS, 2023, 13 (17)