An embedded feature selection method based on generalized classifier neural network for cancer classification

被引:6
作者
Naik, Akshata K. [1 ]
Kuppili, Venkatanareshbabu [1 ]
机构
[1] Natl Inst Technol, Dept Comp Sci & Engn, Ponda, Goa, India
关键词
Embedded feature selection; Generalized classifier neural network; Explainable model; LOGISTIC-REGRESSION; ALGORITHM;
D O I
10.1016/j.compbiomed.2023.107677
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The selection of relevant genes plays a vital role in classifying high-dimensional microarray gene expression data. Sparse group Lasso and its variants have been employed for gene selection to capture the interactions of genes within a group. Most of the embedded methods are linear sparse learning models that fail to capture the non-linear interactions. Additionally, very less attention is given to solving multi-class problems. The existing methods create overlapping groups, which further increases dimensionality. The paper proposes a neural network-based embedded feature selection method that can represent the non-linear relationship. In an effort toward an explainable model, a generalized classifier neural network (GCNN) is adopted as the model for the proposed embedded feature selection. GCNN has well-defined architecture in terms of the number of layers and neurons within each layer. Each layer has a distinct functionality, eliminating the obscure nature of most neural networks. The paper proposes a feature selection approach called Weighted GCNN (WGCNN) that embeds feature weighting as a part of training the neural network. Since the gene expression data comprises a large number of features, to avoid overfitting of the model a statistical guided dropout is implemented at the input layer. The proposed method works for binary as well as multi-class classification problems likewise. Experimental validation is carried out on seven microarray datasets on three learning models and compared with six state-of-art methods that are popularly employed for feature selection. The WGCNN performs well in terms of the F1 score and the number of features selected.
引用
收藏
页数:11
相关论文
共 47 条
[1]  
Ainsworth S, 2018, Arxiv, DOI [arXiv:1802.06765, DOI 10.48550/ARXIV.1802.06765]
[2]   Supervised, Unsupervised, and Semi-Supervised Feature Selection: A Review on Gene Selection [J].
Ang, Jun Chin ;
Mirzal, Andri ;
Haron, Habibollah ;
Hamed, Haza Nuzly Abdull .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2016, 13 (05) :971-989
[3]  
[Anonymous], 2005, Ph.D. Thesis
[4]   Linear Cost-sensitive Max-margin Embedded Feature Selection for SVM [J].
Aram, Khalid Y. ;
Lam, Sarah S. ;
Khasawneh, Mohammad T. .
EXPERT SYSTEMS WITH APPLICATIONS, 2022, 197
[5]   Weighted LAD-LASSO method for robust parameter estimation and variable selection in regression [J].
Arslan, Olcay .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2012, 56 (06) :1952-1965
[6]   A review of microarray datasets and applied feature selection methods [J].
Bolon-Canedo, V. ;
Sanchez-Marono, N. ;
Alonso-Betanzos, A. ;
Benitez, J. M. ;
Herrera, F. .
INFORMATION SCIENCES, 2014, 282 :111-135
[7]   Gene selection in cancer classification using sparse logistic regression with Bayesian regularization [J].
Cawley, Gavin C. ;
Talbot, Nicola L. C. .
BIOINFORMATICS, 2006, 22 (19) :2348-2355
[8]   FeatureMiner: A Tool for Interactive Feature Selection [J].
Cheng, Kewei ;
Li, Jundong ;
Liu, Huan .
CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, :2445-2448
[9]   Detection of Malicious Code Variants Based on Deep Learning [J].
Cui, Zhihua ;
Xue, Fei ;
Cai, Xingjuan ;
Cao, Yang ;
Wang, Gai-ge ;
Chen, Jinjun .
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2018, 14 (07) :3187-3196
[10]   Breast cancer detection using deep learning: Datasets, methods, and challenges ahead [J].
Din, Nusrat Mohi ud ;
Dar, Rayees Ahmad ;
Rasool, Muzafar ;
Assad, Assif .
COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 149