Data augmentation using SMOTE technique: Application for prediction of burst pressure of hydrocarbons pipeline using supervised machine learning models

被引:0
|
作者
Soomro, Afzal Ahmed [1 ,2 ]
Mokhtar, Ainul Akmar [2 ]
Muhammad, Masdi B. [2 ]
Saad, Mohamad Hanif Md [1 ]
Lashari, Najeebullah [3 ,4 ]
Hussain, Muhammad [5 ]
Palli, Abdul Sattar [6 ]
机构
[1] Univ Kebangsaan Malaysia, Dept Mech & Mfg Engn, Bangi 43600, Selangor, Malaysia
[2] Univ Teknol PETRONAS, Mech Engn Dept, Seri Iskandar 32610, Perak Darul, Malaysia
[3] Dawood Univ Engn & Technol, Petr & Gas Engn Dept, MA Jinnah Rd, Karachi 74800, Pakistan
[4] Univ Teknol PETRONAS, Petr Engn Dept, Seri Iskandar 32610, Perak Darul, Malaysia
[5] Univ Wollongong, Northfields Ave Wollongong, Wollongong, NSW 2522, Australia
[6] Univ Teknol PETRONAS, Comp & Informat Sci Dept, Seri Iskandar 32610, Perak Darul, Malaysia
关键词
Burst pressure prediction; Machine learning; SMOTE; Data augmentation; Oil and gas pipelines; Safety; FAILURE PRESSURE; CORROSION DEFECTS; CHARGE; STEEL; STATE;
D O I
10.1016/j.rineng.2024.103233
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Accurate burst pressure prediction is critical for ensuring oil and gas pipeline safety, guiding maintenance decisions, and lowering costs and risks. Traditional methods have limitations, including high experimental costs, conservative empirical models, and computationally expensive numerical algorithms. Machine learning (ML) models have supplanted traditional methods in recent years. However, small and imbalanced datasets are the big challenge to build a ML model that can generate more accurate results. Moreover, the lack of generalization in ML models trained on a dataset of pipelines with specific material grids prevents them from producing superior results on other pipeline types. First, FEA was used to make a dataset. Then, a new way to improve machine learning (ML) model generalization for burst pressure prediction is suggested: combine publicly available datasets of different pipeline specifications. In this combined dataset, some pipelines have a higher number of data samples, and some have fewer, which causes a class imbalance issue. The Synthetic Minority Oversampling Technique (SMOTE) technique was applied to address the issue of class imbalance. The performance of various ML models, Extra Trees (ET), Extreme Gradient Boosting (XGBR), Random Forest (RF), Light Gradient Boosting Machine (LGBM), and Decision Tree (DT), was evaluated to validate the model's prediction and generalization on pipelines of various material grids. Results show that all the selected ML models produced high R-squared, i.e., >0.95, on balanced data compared to the imbalance dataset. These results show that SMOTE-based augmentation is a beneficial way to fix dataset imbalance and make ML models better at predicting burst pressure in oil and gas pipelines.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] Constructing Inpatient Pressure Injury Prediction Models Using Machine Learning Techniques
    Hu, Ya-Han
    Lee, Yi-Lien
    Kang, Ming-Feng
    Lee, Pei-Ju
    CIN-COMPUTERS INFORMATICS NURSING, 2020, 38 (08) : 415 - 423
  • [22] An evaluation of machine learning and deep learning models for drought prediction using weather data
    Jiang, Weiwei
    Luo, Jiayun
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (03) : 3611 - 3626
  • [23] Tamping Effectiveness Prediction Using Supervised Machine Learning Techniques
    Tan, Chang Wei
    Webb, Geoffrey I.
    Petitjean, Francois
    Reichl, Paul
    RAILWAY DEVELOPMENT, OPERATIONS, AND MAINTENANCE: PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON RAIL TRANSPORTATION 2017 (ICRT 2017), 2018, : 1010 - 1023
  • [24] Prediction of the Appropriate Temperature and Pressure for Polymer Dissolution Using Machine Learning Models
    Dadashi, Dorsa
    Kaedi, Marjan
    Dadashi, Parsa
    Sinha Ray, Suprakas
    MOLECULAR INFORMATICS, 2025, 44 (02)
  • [25] Prediction of hydrocarbons ignition performances using machine learning modeling
    Flora, Giacomo
    Karimzadeh, Forood
    Kahandawala, Moshan S. P.
    Dewitt, Matthew J.
    Corporan, Edwin
    FUEL, 2024, 368
  • [26] Prediction of Aquatic Ecosystem Health Indices through Machine Learning Models Using the WGAN-Based Data Augmentation Method
    Lee, Seoro
    Kim, Jonggun
    Lee, Gwanjae
    Hong, Jiyeong
    Bae, Joo Hyun
    Lim, Kyoung Jae
    SUSTAINABILITY, 2021, 13 (18)
  • [27] Self-supervised machine learning using adult inpatient data produces effective models for pediatric clinical prediction tasks
    Lemmon, Joshua
    Guo, Lin Lawrence
    Steinberg, Ethan
    Morse, Keith E.
    Fleming, Scott Lanyon
    Aftandilian, Catherine
    Pfohl, Stephen R.
    Posada, Jose D.
    Shah, Nigam
    Fries, Jason
    Sung, Lillian
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2023, 30 (12) : 2004 - 2011
  • [28] Microgrid Data Prediction Using Machine Learning
    Lautert, Renata Rodrigues
    Cambambi, Claudio Adriano C.
    Rangel, Camilo Alberto S.
    Canha, Luciane Neves
    de Freitas, Adriano Gomes
    Brignol, Wagner da Silva
    2023 15TH SEMINAR ON POWER ELECTRONICS AND CONTROL, SEPOC, 2023,
  • [29] Systematic evaluation of supervised machine learning for sample origin prediction using metagenomic sequencing data
    Julie Chih-yu Chen
    Andrea D. Tyler
    Biology Direct, 15
  • [30] Enhancing Detection of Arabic Social Spam Using Data Augmentation and Machine Learning
    Alkadri, Abdullah M.
    Elkorany, Abeer
    Ahmed, Cherry
    APPLIED SCIENCES-BASEL, 2022, 12 (22):