A machine learning approach for corrosion small datasets

被引:52
|
作者
Sutojo, Totok [1 ,2 ]
Rustad, Supriadi [1 ,2 ]
Akrom, Muhamad [1 ,3 ]
Syukur, Abdul [2 ]
Shidik, Guruh Fajar [2 ]
Dipojono, Hermawan Kresno [3 ]
机构
[1] Dian Nuswantoro Univ, Fac Comp Sci, Res Ctr Mat Informat, Semarang 50131, Indonesia
[2] Dian Nuswantoro Univ, Fac Comp Sci, Doctoral Program Comp Sci, Semarang 50131, Indonesia
[3] Bandung Inst Technol, Adv Funct Mat Res Grp, Bandung 40132, Indonesia
关键词
INHIBITION EFFICIENCY; BENZIMIDAZOLE DERIVATIVES; GAS-INDUSTRY; MILD-STEEL; PREDICTION; DESIGN; MODEL; OIL;
D O I
10.1038/s41529-023-00336-7
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
In this work, we developed a QSAR model using the K-Nearest Neighbor (KNN) algorithm to predict the corrosion inhibition performance of the inhibitor compound. To overcome the small dataset problems, virtual samples are generated and added to the training set using a Virtual Sample Generation (VSG) method. The generalizability of the proposed KNN + VSG model is verified by using six small datasets from references and comparing their prediction performances. The research shows that for the six datasets, the proposed model is able to make predictions with the best accuracy. Adding virtual samples to the training data helps the algorithm recognize feature-target relationship patterns, and therefore increases the number of chemical quantum parameters correlated with corrosion inhibition efficiency. This proposed method strengthens the prospect of ML for developing material designs, especially in the case of small datasets.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Epileptic Seizure Detection for Imbalanced Datasets Using an Integrated Machine Learning Approach
    Masum, Mohammad
    Shahriar, Hossain
    Haddad, Hisham M.
    42ND ANNUAL INTERNATIONAL CONFERENCES OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY: ENABLING INNOVATIVE TECHNOLOGIES FOR GLOBAL HEALTHCARE EMBC'20, 2020, : 5416 - 5419
  • [32] QDataSet, quantum datasets for machine learning
    Elija Perrier
    Akram Youssry
    Chris Ferrie
    Scientific Data, 9
  • [33] A Novel Approach for Dealing with Missing Values in Machine Learning Datasets with Discrete Values
    Abu-Soud, Saleh M.
    2019 INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCES (ICCIS), 2019, : 118 - 122
  • [34] A voting-based machine learning approach for classifying biological and clinical datasets
    Daneshvar, Negar Hossein-Nezhad
    Masoudi-Sobhanzadeh, Yosef
    Omidi, Yadollah
    BMC BIOINFORMATICS, 2023, 24 (01)
  • [35] TaxaHFE: a machine learning approach to collapse microbiome datasets using taxonomic structure
    Oliver, Andrew
    Kay, Matthew
    Lemay, Danielle G.
    BIOINFORMATICS ADVANCES, 2023, 3 (01):
  • [36] Comparison of Visual Datasets for Machine Learning
    Gauen, Kent
    Dailey, Ryan
    Laiman, John
    Zi, Yuxiang
    Asokan, Nirmal
    Lu, Yung-Hsiang
    Thiruvathukal, George K.
    Shyu, Mei-Ling
    Chen, Shu-Ching
    2017 IEEE 18TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IEEE IRI 2017), 2017, : 346 - 355
  • [37] Datasets with rich labels for machine learning
    Hoarau, Arthur
    Thierry, Constance
    Martin, Arnaud
    Dubois, Jean-Christophe
    Le Gall, Yolande
    2023 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, FUZZ, 2023,
  • [38] Image Watermarking for Machine Learning Datasets
    Maesen, Palle
    Isler, Devris
    Laoutaris, Nikolaos
    Erkin, Zekeriya
    PROCEEDINGS OF THE 2ND ACM DATA ECONOMY WORKSHOP, DEC 2023, 2023, : 7 - 13
  • [39] Morse Code Datasets for Machine Learning
    Dey, Sourya
    Chugg, Keith M.
    Beerel, Peter A.
    2018 9TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2018,
  • [40] QDataSet, quantum datasets for machine learning
    Perrier, Elija
    Youssry, Akram
    Ferrie, Chris
    SCIENTIFIC DATA, 2022, 9 (01)