Combination of Feature Selection and Resampling Methods to Predict Preterm Birth Based on Electrohysterographic Signals from Imbalance Data

被引:13
|
作者
Nieto-del-Amor, Felix [1 ]
Prats-Boluda, Gema [1 ]
Garcia-Casado, Javier [1 ]
Diaz-Martinez, Alba [1 ]
Jose Diago-Almela, Vicente [2 ]
Monfort-Ortiz, Rogelio [2 ]
Hao, Dongmei [3 ]
Ye-Lin, Yiyao [1 ]
机构
[1] Univ Politecn Valencia, Ctr Invest & Innovac Bioingn, E-46022 Valencia, Spain
[2] HUP La Fe, Serv Obstet, Valencia 46026, Spain
[3] Beijing Univ Technol, Fac Environm & Life, Beijing Int Sci & Technol Cooperat Base Intellige, Beijing 100124, Peoples R China
关键词
genetic algorithm; imbalance data learning; electrohysterography; preterm labor prediction; resampling methods; uterine electromyography; machine learning; CLASSIFICATION; CLASSIFIERS; PERFORMANCE; ALGORITHM; ACCURACY; LABOR; TERM; SETS;
D O I
10.3390/s22145098
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Due to its high sensitivity, electrohysterography (EHG) has emerged as an alternative technique for predicting preterm labor. The main obstacle in designing preterm labor prediction models is the inherent preterm/term imbalance ratio, which can give rise to relatively low performance. Numerous studies obtained promising preterm labor prediction results using the synthetic minority oversampling technique. However, these studies generally overestimate mathematical models' real generalization capacity by generating synthetic data before splitting the dataset, leaking information between the training and testing partitions and thus reducing the complexity of the classification task. In this work, we analyzed the effect of combining feature selection and resampling methods to overcome the class imbalance problem for predicting preterm labor by EHG. We assessed undersampling, oversampling, and hybrid methods applied to the training and validation dataset during feature selection by genetic algorithm, and analyzed the resampling effect on training data after obtaining the optimized feature subset. The best strategy consisted of undersampling the majority class of the validation dataset to 1:1 during feature selection, without subsequent resampling of the training data, achieving an AUC of 94.5 +/- 4.6%, average precision of 84.5 +/- 11.7%, maximum F1-score of 79.6 +/- 13.8%, and recall of 89.8 +/- 12.1%. Our results outperformed the techniques currently used in clinical practice, suggesting the EHG could be used to predict preterm labor in clinics.
引用
收藏
页数:18
相关论文
共 50 条
  • [31] A Machine Learning-Based Framework for Accurate and Early Diagnosis of Liver Diseases: A Comprehensive Study on Feature Selection, Data Imbalance, and Algorithmic Performance
    Rehman, Attique Ur
    Butt, Wasi Haider
    Ali, Tahir Muhammad
    Javaid, Sabeen
    Almufareh, Maram Fahaad
    Humayun, Mamoona
    Rahman, Hameedur
    Mir, Azka
    Shaheen, Momina
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2024, 2024
  • [32] Stable and Accurate Feature Selection from Microarray Data with Ensembled Fast Correlation Based Filter
    Wang, Aiguo
    Liu, Huancheng
    Liu, Jinjun
    Ding, Huitong
    Yang, Jing
    Chen, Guilin
    2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 2996 - 2998
  • [33] Correlation-based feature selection of single cell transcriptomics data from multiple sources
    Mitic, Nenad S.
    Malkov, Sasa N.
    Ruzicic, Mirjana M. Maljkovic
    Veljkovic, Aleksandar N.
    Cukic, Ivan Lj.
    Lin, Xin
    Lyu, Minjie
    Brusic, Vladimir
    JOURNAL OF BIG DATA, 2025, 12 (01)
  • [34] An Efficient Filter-Based Feature Selection Model to Identify Significant Features from High-Dimensional Microarray Data
    Raj, D. M. Deepak
    Mohanasundaram, R.
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2020, 45 (04) : 2619 - 2630
  • [35] A robust method for early diagnosis of autism spectrum disorder from EEG signals based on feature selection and DBSCAN method
    Abdolzadegan, Donya
    Moattar, Mohammad Hossein
    Ghoshuni, Majid
    BIOCYBERNETICS AND BIOMEDICAL ENGINEERING, 2020, 40 (01) : 482 - 493
  • [36] Diagnosis of Obstructive Sleep Apnea Using Feature Selection, Classification Methods, and Data Grouping Based Age, Sex, and Race
    Sheta, Alaa
    Thaher, Thaer
    Surani, Salim R.
    Turabieh, Hamza
    Braik, Malik
    Too, Jingwei
    Abu-El-Rub, Noor
    Mafarjah, Majdi
    Chantar, Hamouda
    Subramanian, Shyam
    DIAGNOSTICS, 2023, 13 (14)
  • [37] Cross-subject driver status detection from physiological signals based on hybrid feature selection and transfer learning
    Chen, Lan-lan
    Zhang, Ao
    Lou, Xiao-guang
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 137 : 266 - 280
  • [38] A filter-predictor polynomial feature based machine learning approach to predicting preterm birth from cervical electrical impedance spectroscopy
    Tian, David
    Lang, Zi-Qiang
    Di Zhang, Di
    Anumba, Dilly O.
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 80
  • [39] A fuzzy based feature selection from independent component subspace for machine learning classification of microarray data
    Aziz, Rabia
    Verma, C. K.
    Srivastava, Namita
    GENOMICS DATA, 2016, 8 : 4 - 15
  • [40] Interpretable Data-Driven Approach Based on Feature Selection Methods and GAN-Based Models for Cardiovascular Risk Prediction in Diabetic Patients
    Chushig-Muzo, David
    Calero-Diaz, Hugo
    Lara-Abelenda, Francisco J.
    Gomez-Martinez, Vanesa
    Granja, Conceicao
    Soguero-Ruiz, Cristina
    IEEE ACCESS, 2024, 12 : 84292 - 84305