A filter approach for feature selection in classification: application to automatic atrial fibrillation detection in electrocardiogram recordings

被引:16
作者
Michel, Pierre [1 ]
Ngo, Nicolas [2 ]
Pons, Jean-Francois [3 ]
Delliaux, Stephane [4 ,5 ]
Giorgi, Roch [6 ]
机构
[1] Aix Marseille Univ, EHESS, CNRS, AMSE,Cent Marseille, Marseille, France
[2] Aix Marseille Univ, INSERM, IRD, Sci Econ & Sociales Sante & Traitement Informat, Marseille, France
[3] WitMonki SAS, Marseille, France
[4] Aix Marseille Univ, INSERM, INRAE, C2VN, Marseille, France
[5] Hop Nord Marseille, APHM, Serv Explorat Fonct Resp, Pole Cardiovasc, Marseille, France
[6] Aix Marseille Univ, APHM,INSERM, Hop Timone Biostat & Technol Informat & Commun Bi, IRD,Sci Econ & Sociales Sante & Traitement Inform, Marseille, France
关键词
gamma-metric; Machine learning; Feature selection; Classification; Clinical decision making; Atrial fibrillation detection; ALGORITHMS;
D O I
10.1186/s12911-021-01427-8
中图分类号
R-058 [];
学科分类号
摘要
Background: In high-dimensional data analysis, the complexity of predictive models can be reduced by selecting the most relevant features, which is crucial to reduce data noise and increase model accuracy and interpretability. Thus, in the field of clinical decision making, only the most relevant features from a set of medical descriptors should be considered when determining whether a patient is healthy or not. This statistical approach known as feature selection can be performed through regression or classification, in a supervised or unsupervised manner. Several feature selection approaches using different mathematical concepts have been described in the literature. In the field of classification, a new approach has recently been proposed that uses the gamma-metric, an index measuring separability between different classes in heart rhythm characterization. The present study proposes a filter approach for feature selection in classification using this gamma-metric, and evaluates its application to automatic atrial fibrillation detection. Methods: The stability and prediction performance of the gamma-metric feature selection approach was evaluated using the support vector machine model on two heart rhythm datasets, one extracted from the PhysioNet database and the other from the database of Marseille University Hospital Center, France (Timone Hospital). Both datasets contained electrocardiogram recordings grouped into two classes: normal sinus rhythm and atrial fibrillation. The performance of this feature selection approach was compared to that of three other approaches, with the first two based on the Random Forest technique and the other on receiver operating characteristic curve analysis. Results: The gamma-metric approach showed satisfactory results, especially for models with a smaller number of features. For the training dataset, all prediction indicators were higher for our approach (accuracy greater than 99% for models with 5 to 17 features), as was stability (greater than 0.925 regardless of the number of features included in the model). For the validation dataset, the features selected with the gamma-metric approach differed from those selected with the other approaches; sensitivity was higher for our approach, but other indicators were similar. Conclusion: This filter approach for feature selection in classification opens up new methodological avenues for atrial fibrillation detection using short electrocardiogram recordings.
引用
收藏
页数:17
相关论文
共 34 条
[1]  
[Anonymous], 2021, IEEE Trans. Broadcast.
[2]  
[Anonymous], 1998, Feature Extraction, Construction and Selection: A Data Mining Perspective
[3]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[4]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[5]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
[6]  
Dash M., 1997, Intelligent Data Analysis, V1
[7]   Analysis of feature selection stability on high dimension and small sample data [J].
Dernoncourt, David ;
Hanczar, Blaise ;
Zucker, Jean-Daniel .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 71 :681-693
[8]   Ant colony optimization -: Artificial ants as a computational intelligence technique [J].
Dorigo, Marco ;
Birattari, Mauro ;
Stuetzle, Thomas .
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE, 2006, 1 (04) :28-39
[9]  
Duch W, 2002, ICONIP'02: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING, P1951, DOI 10.1109/ICONIP.2002.1199014
[10]   Deep learning for healthcare applications based on physiological signals: A review [J].
Faust, Oliver ;
Hagiwara, Yuki ;
Hong, Tan Jen ;
Lih, Oh Shu ;
Acharya, U. Rajendra .
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2018, 161 :1-13