Combination of Feature Selection and Resampling Methods to Predict Preterm Birth Based on Electrohysterographic Signals from Imbalance Data

被引:13
|
作者
Nieto-del-Amor, Felix [1 ]
Prats-Boluda, Gema [1 ]
Garcia-Casado, Javier [1 ]
Diaz-Martinez, Alba [1 ]
Jose Diago-Almela, Vicente [2 ]
Monfort-Ortiz, Rogelio [2 ]
Hao, Dongmei [3 ]
Ye-Lin, Yiyao [1 ]
机构
[1] Univ Politecn Valencia, Ctr Invest & Innovac Bioingn, E-46022 Valencia, Spain
[2] HUP La Fe, Serv Obstet, Valencia 46026, Spain
[3] Beijing Univ Technol, Fac Environm & Life, Beijing Int Sci & Technol Cooperat Base Intellige, Beijing 100124, Peoples R China
关键词
genetic algorithm; imbalance data learning; electrohysterography; preterm labor prediction; resampling methods; uterine electromyography; machine learning; CLASSIFICATION; CLASSIFIERS; PERFORMANCE; ALGORITHM; ACCURACY; LABOR; TERM; SETS;
D O I
10.3390/s22145098
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Due to its high sensitivity, electrohysterography (EHG) has emerged as an alternative technique for predicting preterm labor. The main obstacle in designing preterm labor prediction models is the inherent preterm/term imbalance ratio, which can give rise to relatively low performance. Numerous studies obtained promising preterm labor prediction results using the synthetic minority oversampling technique. However, these studies generally overestimate mathematical models' real generalization capacity by generating synthetic data before splitting the dataset, leaking information between the training and testing partitions and thus reducing the complexity of the classification task. In this work, we analyzed the effect of combining feature selection and resampling methods to overcome the class imbalance problem for predicting preterm labor by EHG. We assessed undersampling, oversampling, and hybrid methods applied to the training and validation dataset during feature selection by genetic algorithm, and analyzed the resampling effect on training data after obtaining the optimized feature subset. The best strategy consisted of undersampling the majority class of the validation dataset to 1:1 during feature selection, without subsequent resampling of the training data, achieving an AUC of 94.5 +/- 4.6%, average precision of 84.5 +/- 11.7%, maximum F1-score of 79.6 +/- 13.8%, and recall of 89.8 +/- 12.1%. Our results outperformed the techniques currently used in clinical practice, suggesting the EHG could be used to predict preterm labor in clinics.
引用
收藏
页数:18
相关论文
共 50 条
  • [41] MUTUAL INFORMATION BASED FEATURE SELECTION FROM DATA DRIVEN AND MODEL BASED TECHNIQUES FOR FAULT DETECTION IN ROLLING ELEMENT BEARINGS
    Kappaganthu, Karthik
    Nataraj, C.
    PROCEEDINGS OF THE ASME INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE 2011, VOL 1, PTS A AND B: 23RD BIENNIAL CONFERENCE ON MECHANICAL VIBRATION AND NOISE, 2012, : 941 - 953
  • [42] Hamming Distance based Binary PSO for Feature Selection and Classification from high dimensional Gene Expression Data
    Banka, Haider
    Dara, Suresh
    PROCEEDINGS IWBBIO 2014: INTERNATIONAL WORK-CONFERENCE ON BIOINFORMATICS AND BIOMEDICAL ENGINEERING, VOLS 1 AND 2, 2014, : 507 - 514
  • [43] Comparison of eight filter-based feature selection methods for monthly streamflow forecasting - Three case studies on CAMELS data sets
    Ren, Kun
    Fang, Wei
    Qu, Jihong
    Zhang, Xia
    Shi, Xiaoyu
    JOURNAL OF HYDROLOGY, 2020, 586
  • [44] A Framework on Performance Analysis of Mathematical Model-Based Classifiers in Detection of Epileptic Seizure from EEG Signals with Efficient Feature Selection
    Hemachandira, V. S.
    Viswanathan, R.
    JOURNAL OF HEALTHCARE ENGINEERING, 2022, 2022
  • [45] Leaf Area Index Estimation Algorithm for GF-5 Hyperspectral Data Based on Different Feature Selection and Machine Learning Methods
    Chen, Zhulin
    Jia, Kun
    Xiao, Chenchao
    Wei, Dandan
    Zhao, Xiang
    Lan, Jinhui
    Wei, Xiangqin
    Yao, Yunjun
    Wang, Bing
    Sun, Yuan
    Wang, Lei
    REMOTE SENSING, 2020, 12 (13)
  • [46] Optimal feature extraction from multidimensional remote sensing data for orchard identification based on deep learning methods
    Luo, Junjie
    Guo, Jiao
    Zhu, Zhe
    Du, Yunlong
    Ye, Yongkai
    JOURNAL OF APPLIED REMOTE SENSING, 2024, 18 (01)
  • [47] Binary Particle Swarm Optimization-Based Feature Selection for Predicting the Class of the Knee Angle from EMG Signals in Lower Limb Movements
    Dhindsa, I. S.
    Gupta, R.
    Agarwal, R.
    NEUROPHYSIOLOGY, 2022, 53 (02) : 109 - 119
  • [48] An Efficient Filter-Based Feature Selection Model to Identify Significant Features from High-Dimensional Microarray Data
    D. M. Deepak Raj
    R. Mohanasundaram
    Arabian Journal for Science and Engineering, 2020, 45 : 2619 - 2630
  • [49] Identification of tea plantations in typical plateau areas with the combination of Sentinel-1/2 optical and radar remote sensing data based on feature selection algorithm
    Gao, Shanchuan
    Tang, Bo-Hui
    Huang, Liang
    Chen, Guokun
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2024, 45 (19-20) : 7033 - 7053
  • [50] Improving flood forecasting through feature selection by a genetic algorithm - experiments based on real data from an Amazon rainforest river
    Vieira, Alen Costa
    Garcia, Gabriel
    Pabon, Rosa E. C.
    Cota, Luciano P.
    de Souza, Paulo
    Ueyama, Jo
    Pessin, Gustavo
    EARTH SCIENCE INFORMATICS, 2021, 14 (01) : 37 - 50