Combination of Feature Selection and Resampling Methods to Predict Preterm Birth Based on Electrohysterographic Signals from Imbalance Data

被引:13
|
作者
Nieto-del-Amor, Felix [1 ]
Prats-Boluda, Gema [1 ]
Garcia-Casado, Javier [1 ]
Diaz-Martinez, Alba [1 ]
Jose Diago-Almela, Vicente [2 ]
Monfort-Ortiz, Rogelio [2 ]
Hao, Dongmei [3 ]
Ye-Lin, Yiyao [1 ]
机构
[1] Univ Politecn Valencia, Ctr Invest & Innovac Bioingn, E-46022 Valencia, Spain
[2] HUP La Fe, Serv Obstet, Valencia 46026, Spain
[3] Beijing Univ Technol, Fac Environm & Life, Beijing Int Sci & Technol Cooperat Base Intellige, Beijing 100124, Peoples R China
关键词
genetic algorithm; imbalance data learning; electrohysterography; preterm labor prediction; resampling methods; uterine electromyography; machine learning; CLASSIFICATION; CLASSIFIERS; PERFORMANCE; ALGORITHM; ACCURACY; LABOR; TERM; SETS;
D O I
10.3390/s22145098
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Due to its high sensitivity, electrohysterography (EHG) has emerged as an alternative technique for predicting preterm labor. The main obstacle in designing preterm labor prediction models is the inherent preterm/term imbalance ratio, which can give rise to relatively low performance. Numerous studies obtained promising preterm labor prediction results using the synthetic minority oversampling technique. However, these studies generally overestimate mathematical models' real generalization capacity by generating synthetic data before splitting the dataset, leaking information between the training and testing partitions and thus reducing the complexity of the classification task. In this work, we analyzed the effect of combining feature selection and resampling methods to overcome the class imbalance problem for predicting preterm labor by EHG. We assessed undersampling, oversampling, and hybrid methods applied to the training and validation dataset during feature selection by genetic algorithm, and analyzed the resampling effect on training data after obtaining the optimized feature subset. The best strategy consisted of undersampling the majority class of the validation dataset to 1:1 during feature selection, without subsequent resampling of the training data, achieving an AUC of 94.5 +/- 4.6%, average precision of 84.5 +/- 11.7%, maximum F1-score of 79.6 +/- 13.8%, and recall of 89.8 +/- 12.1%. Our results outperformed the techniques currently used in clinical practice, suggesting the EHG could be used to predict preterm labor in clinics.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Assessment of Dispersion and Bubble Entropy Measures for Enhancing Preterm Birth Prediction Based on Electrohysterographic Signals
    Nieto-del-Amor, Felix
    Beskhani, Raja
    Ye-Lin, Yiyao
    Garcia-Casado, Javier
    Diaz-Martinez, Alba
    Monfort-Ortiz, Rogelio
    Jose Diago-Almela, Vicente
    Hao, Dongmei
    Prats-Boluda, Gema
    SENSORS, 2021, 21 (18)
  • [2] Combination of Ensembles of Regularized Regression Models with Resampling-Based Lasso Feature Selection in High Dimensional Data
    Patil, Abhijeet R.
    Kim, Sangjin
    MATHEMATICS, 2020, 8 (01)
  • [3] Enhancing classification of preterm-term birth using continuous wavelet transform and entropy-based methods of electrohysterogram signals
    Romero-Morales, Hector
    Munoz-Montes de Oca, Jenny Noemi
    Mora-Martinez, Rodrigo
    Mina-Paz, Yecid
    Javier Reyes-Lagos, Jose
    FRONTIERS IN ENDOCRINOLOGY, 2023, 13
  • [4] Classification of Gene Expression Data Using Feature Selection Based on Type Combination Approach Model With Advanced Feature Selection Technology
    Siddesh, G. M.
    Gururaj, T.
    INTERNATIONAL JOURNAL OF COGNITIVE INFORMATICS AND NATURAL INTELLIGENCE, 2021, 15 (04)
  • [5] Comparing of Feature Selection and Classification Methods on Report-Based Subhealth Data
    Huang, Li
    Yan, Shixing
    Yuan, Jiamin
    Zuo, Zhiya
    Xu, Fuping
    Lin, Yanzhao
    Yang, Mary Qu
    Yang, Zhimin
    Li, Guo-Zheng
    2016 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2016, : 1356 - 1358
  • [6] Feature selection methods for big data bioinformatics: A survey from the search perspective
    Wang, Lipo
    Wang, Yaoli
    Chang, Qing
    METHODS, 2016, 111 : 21 - 31
  • [7] An optimized support vector machine intelligent technique using optimized feature selection methods: evidence from Chinese credit approval data
    Abedin, Mohammad Zoynul
    Guotai, Chi
    Fahmida-E-Moula
    Zhang, Tong
    Hassan, M. Kabir
    JOURNAL OF RISK MODEL VALIDATION, 2019, 13 (02): : 1 - 46
  • [8] A new approach for gender detection from voice data: Feature selection with optimization methods
    Ozbay, Feyza Altunbey
    Ozbay, Erdal
    JOURNAL OF THE FACULTY OF ENGINEERING AND ARCHITECTURE OF GAZI UNIVERSITY, 2023, 38 (02): : 1179 - 1192
  • [9] Data Visualization and Feature Selection Methods in Gel-based Proteomics
    Silva, Tome S.
    Richard, Nadege
    Dias, Jorge P.
    Rodrigues, Pedro M.
    CURRENT PROTEIN & PEPTIDE SCIENCE, 2014, 15 (01) : 4 - 22
  • [10] An Overview of Methods for Feature Selection Based on Mutual Information for Stream Data Classification
    Wankhade, Kapil
    Rane, Dhiraj
    Thool, Ravindra
    2013 INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT 2013), 2013, : 630 - 634