Data augmentation using SMOTE technique: Application for prediction of burst pressure of hydrocarbons pipeline using supervised machine learning models

被引:0
|
作者
Soomro, Afzal Ahmed [1 ,2 ]
Mokhtar, Ainul Akmar [2 ]
Muhammad, Masdi B. [2 ]
Saad, Mohamad Hanif Md [1 ]
Lashari, Najeebullah [3 ,4 ]
Hussain, Muhammad [5 ]
Palli, Abdul Sattar [6 ]
机构
[1] Univ Kebangsaan Malaysia, Dept Mech & Mfg Engn, Bangi 43600, Selangor, Malaysia
[2] Univ Teknol PETRONAS, Mech Engn Dept, Seri Iskandar 32610, Perak Darul, Malaysia
[3] Dawood Univ Engn & Technol, Petr & Gas Engn Dept, MA Jinnah Rd, Karachi 74800, Pakistan
[4] Univ Teknol PETRONAS, Petr Engn Dept, Seri Iskandar 32610, Perak Darul, Malaysia
[5] Univ Wollongong, Northfields Ave Wollongong, Wollongong, NSW 2522, Australia
[6] Univ Teknol PETRONAS, Comp & Informat Sci Dept, Seri Iskandar 32610, Perak Darul, Malaysia
关键词
Burst pressure prediction; Machine learning; SMOTE; Data augmentation; Oil and gas pipelines; Safety; FAILURE PRESSURE; CORROSION DEFECTS; CHARGE; STEEL; STATE;
D O I
10.1016/j.rineng.2024.103233
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Accurate burst pressure prediction is critical for ensuring oil and gas pipeline safety, guiding maintenance decisions, and lowering costs and risks. Traditional methods have limitations, including high experimental costs, conservative empirical models, and computationally expensive numerical algorithms. Machine learning (ML) models have supplanted traditional methods in recent years. However, small and imbalanced datasets are the big challenge to build a ML model that can generate more accurate results. Moreover, the lack of generalization in ML models trained on a dataset of pipelines with specific material grids prevents them from producing superior results on other pipeline types. First, FEA was used to make a dataset. Then, a new way to improve machine learning (ML) model generalization for burst pressure prediction is suggested: combine publicly available datasets of different pipeline specifications. In this combined dataset, some pipelines have a higher number of data samples, and some have fewer, which causes a class imbalance issue. The Synthetic Minority Oversampling Technique (SMOTE) technique was applied to address the issue of class imbalance. The performance of various ML models, Extra Trees (ET), Extreme Gradient Boosting (XGBR), Random Forest (RF), Light Gradient Boosting Machine (LGBM), and Decision Tree (DT), was evaluated to validate the model's prediction and generalization on pipelines of various material grids. Results show that all the selected ML models produced high R-squared, i.e., >0.95, on balanced data compared to the imbalance dataset. These results show that SMOTE-based augmentation is a beneficial way to fix dataset imbalance and make ML models better at predicting burst pressure in oil and gas pipelines.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Breast Cancer Prediction using Machine Learning Models
    Iparraguirre-Villanueva, Orlando
    Epifania-Huerta, Andres
    Torres-Ceclen, Carmen
    Ruiz-Alvarado, John
    Cabanillas-Carbonell, Michael
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (02) : 610 - 620
  • [42] Cocrystal Prediction Using Machine Learning Models and Descriptors
    Mswahili, Medard Edmund
    Lee, Min-Jeong
    Martin, Gati Lother
    Kim, Junghyun
    Kim, Paul
    Choi, Guang J.
    Jeong, Young-Seob
    APPLIED SCIENCES-BASEL, 2021, 11 (03): : 1 - 12
  • [43] Prediction of Frailty Grade Using Machine Learning Models
    Erdas, Cagatay Berke
    Olcer, Didem
    2022 MEDICAL TECHNOLOGIES CONGRESS (TIPTEKNO'22), 2022,
  • [44] Robust diabetic prediction using ensemble machine learning models with synthetic minority over-sampling technique
    Sampath, Pradeepa
    Elangovan, Gurupriya
    Ravichandran, Kaaveya
    Shanmuganathan, Vimal
    Pasupathi, Subbulakshmi
    Chakrabarti, Tulika
    Chakrabarti, Prasun
    Margala, Martin
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [45] Prediction of Battery Cycle Life Using Early-Cycle Data, Machine Learning and Data Management
    Celik, Belen
    Sandt, Roland
    dos Santos, Lara Caroline Pereira
    Spatschek, Robert
    BATTERIES-BASEL, 2022, 8 (12):
  • [46] Performance Evaluation of Supervised Machine Learning Algorithms Using Different Data Set Sizes for Diabetes Prediction
    Radja, Melky
    Emanuel, Andi Wahju Rahardjo
    2019 5TH INTERNATIONAL CONFERENCE ON SCIENCE ININFORMATION TECHNOLOGY (ICSITECH): EMBRACING INDUSTRY 4.0 - TOWARDS INNOVATION IN CYBER PHYSICAL SYSTEM, 2019, : 252 - 258
  • [47] Machine Learning Based Flashover Prediction Models Using Synthetic Data and Fire Images
    Song, Yansheng
    Xiao, Guang
    Wang, Haoran
    FIRE TECHNOLOGY, 2025,
  • [48] Comparison of Machine Learning Models in Prediction of Cardiovascular Disease Using Health Record Data
    Maiga, Jaouja
    Hungilo, Gilbert Gutabaga
    Pranowo
    2019 INTERNATIONAL CONFERENCE ON INFORMATICS, MULTIMEDIA, CYBER AND INFORMATION SYSTEM (ICIMCIS), 2019, : 45 - 48
  • [49] Delirium Prediction using Machine Learning Models on Preoperative Electronic Health Records Data
    Davoudi, Anis
    Ebadi, Ashkan
    Rashidi, Parisa
    Ozrazgat-Baslanti, Tazcan
    Bihorac, Azra
    Bursian, Alberto C.
    2017 IEEE 17TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE), 2017, : 568 - 573
  • [50] Investigation on the data augmentation using machine learning algorithms in structural health monitoring information
    Tan, Xuyan
    Sun, Xuanxuan
    Chen, Weizhong
    Du, Bowen
    Ye, Junchen
    Sun, Leilei
    STRUCTURAL HEALTH MONITORING-AN INTERNATIONAL JOURNAL, 2021, 20 (04): : 2054 - 2068