Adaptive feature selection with shapley and hypothetical testing: Case study of EEG feature engineering

被引:23
作者
Yin, Dingze [1 ]
Chen, Dan [1 ]
Tang, Yunbo [1 ]
Dong, Heyou [1 ]
Li, Xiaoli [2 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan 430072, Peoples R China
[2] Beijing Normal Univ, Natl Key Lab Cognit Neurosci & Learning, Beijing 100875, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature selection; SHapley Additive exPlanations; Tensor factorization; Feature engineering; Electroencephalogram; Autism spectrum disorder; MUTUAL INFORMATION;
D O I
10.1016/j.ins.2021.11.063
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Feature selection aims to explore the characteristics of a problem that is under investigation instead of focusing on extracting (deep) features or classification tasks. The pending issues being explored are as follows: 1) to minimize the interference of uncertain irrelevant features and 2) to construct the (full) set of relevant features as an individual feature that might be too weak, such as those extracted from bio-signals, for example, electroencephalograms (EEGs). This study fosters an adaptive feature selection approach with the Shapley value and hypothetical testing (abbrev. ShapHT+) via adaptive relevance evaluation. The tree SHAP (SHapley Additive exPlanations) method first can be used to quantify the importance of each candidate feature. An adaptive threshold is then derived to evaluate the feature's relevance to a priori information, and all relevant features are then selected through the hypothesis testing for binomial distribution. The benchmarks indicate that ShapHT+ significantly outperforms its mainstream counterparts in ruling out interferences and the efficiency and accuracy in selecting all relevant features. The case study of EEG feature engineering for autism spectrum disorder (ASD) evaluation indicates that the features selected by ShapHT+ 1) can achieve the highest classification accuracy (82:44%) and 2) well match the observations in recent ASD research. (c) 2021 Elsevier Inc. All rights reserved.
引用
收藏
页码:374 / 390
页数:17
相关论文
共 50 条
[1]  
Abdar M., 2021, UncertaintyFuseNet: robust uncertainty-aware hierarchical feature fusion with ensemble Monte Carlo dropout for COVID-19 detection
[2]   Uncertainty quantification in skin cancer classification using three-way decision-based Bayesian deep learning [J].
Abdar, Moloud ;
Samami, Maryam ;
Mahmoodabad, Sajjad Dehghani ;
Doan, Thang ;
Mazoure, Bogdan ;
Hashemifesharaki, Reza ;
Liu, Li ;
Khosravi, Abbas ;
Acharya, U. Rajendra ;
Makarenkov, Vladimir ;
Nahavandi, Saeid .
COMPUTERS IN BIOLOGY AND MEDICINE, 2021, 135
[3]   A review of uncertainty quantification in deep learning: Techniques, applications and challenges [J].
Abdar, Moloud ;
Pourpanah, Farhad ;
Hussain, Sadiq ;
Rezazadegan, Dana ;
Liu, Li ;
Ghavamzadeh, Mohammad ;
Fieguth, Paul ;
Cao, Xiaochun ;
Khosravi, Abbas ;
Acharya, U. Rajendra ;
Makarenkov, Vladimir ;
Nahavandi, Saeid .
INFORMATION FUSION, 2021, 76 :243-297
[4]   Feature Selection Model based on EEG Signals for Assessing the Cognitive Workload in Drivers [J].
Becerra-Sanchez, Patricia ;
Reyes-Munoz, Angelica ;
Guerrero-Ibanez, Antonio .
SENSORS, 2020, 20 (20) :1-25
[5]   Feature selection using Joint Mutual Information Maximisation [J].
Bennasar, Mohamed ;
Hicks, Yulia ;
Setchi, Rossitza .
EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (22) :8520-8532
[6]   Reference-point-based multi-objective optimization algorithm with opposition-based voting scheme for multi-label feature selection [J].
Bidgoli, Azam Asilian ;
Ebrahimpour-Komleh, Hossein ;
Rahnamayan, Shahryar .
INFORMATION SCIENCES, 2021, 547 :1-17
[7]  
Biswal S., 2019, PMLR, P513
[8]   Looking at neurodevelopment through a big data lens [J].
Briscoe, James ;
Marin, Oscar .
SCIENCE, 2020, 369 (6510) :1447-+
[9]  
Brown K.E., 2020, 33 INT FLAIRS C
[10]   Visualizing the Feature Importance for Black Box Models [J].
Casalicchio, Giuseppe ;
Molnar, Christoph ;
Bischl, Bernd .
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2018, PT I, 2019, 11051 :655-670