Regulatory Genes Through Robust-SNR for Binary Classification Within Functional Genomics Experiments

被引:2
作者
Hamraz, Muhammad [1 ]
Khan, Dost Muhammad [1 ]
Gul, Naz [1 ]
Ali, Amjad [1 ]
Khan, Zardad [1 ]
Ahmad, Shafiq [2 ]
Alqahtani, Mejdal [2 ]
Gardezi, Akber Abid [3 ]
Shafiq, Muhammad [4 ]
机构
[1] Abdul Wali Khan Univ, Dept Stat, Mardan 23200, Pakistan
[2] King Saud Univ, Coll Engn, Ind Engn Dept, POB 800, Riyadh 11421, Saudi Arabia
[3] COMSATS Univ Islamabad, Dept Comp Sci, Islamabad 45550, Pakistan
[4] Yeungnam Univ, Dept Informat & Commun Engn, Gyongsan 38541, South Korea
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2023年 / 74卷 / 02期
关键词
Median absolute deviation (MAD); classification; feature selection; high dimensional gene expression datasets; signal to noise ratio; FEATURE-SELECTION; TUMOR;
D O I
10.32604/cmc.2023.030064
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The current study proposes a novel technique for feature selection by inculcating robustness in the conventional Signal to noise Ratio (SNR). The proposed method utilizes the robust measures of location i.e., the "Median " as well as the measures of variation i.e., "Median absolute deviation (MAD) and Interquartile range (IQR) " in the SNR. By this way, two independent robust signal-to-noise ratios have been proposed. The proposed method selects the most informative genes/features by combining the mini-mum subset of genes or features obtained via the greedy search approach with top-ranked genes selected through the robust signal-to-noise ratio (RSNR). The results obtained via the proposed method are compared with well-known gene/feature selection methods on the basis of performance metric i.e., classification error rate. A total of 5 gene expression datasets have been used in this study. Different subsets of informative genes are selected by the proposed and all the other methods included in the study, and their efficacy in terms of classification is investigated by using the classifier models such as support vector machine (SVM), Random forest (RF) and k-nearest neighbors (k-NN). The results of the analysis reveal that the proposed method (RSNR) produces minimum error rates than all the other competing feature selection methods in majority of the cases. For further assessment of the method, a detailed simulation study is also conducted.
引用
收藏
页码:3663 / 3677
页数:15
相关论文
共 29 条
  • [1] Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays
    Alon, U
    Barkai, N
    Notterman, DA
    Gish, K
    Ybarra, S
    Mack, D
    Levine, AJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) : 6745 - 6750
  • [2] The painter's feature selection for gene expression data
    Apiletti, Daniele
    Baralis, Elena
    Bruno, Giulia
    Fiori, Alessandro
    [J]. 2007 ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-16, 2007, : 4227 - 4230
  • [3] MaskedPainter: Feature selection for microarray data analysis
    Apiletti, Daniele
    Baralis, Elena
    Bruno, Giulia
    Fiori, Alessandro
    [J]. INTELLIGENT DATA ANALYSIS, 2012, 16 (04) : 717 - 737
  • [4] Bonanza S. H., 2018, LEUKEMIA, V8, P72
  • [5] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [6] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [7] Breiman L, 1984, CLASSIFICATION REGRE
  • [8] Support vector machines for histogram-based image classification
    Chapelle, O
    Haffner, P
    Vapnik, VN
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 1999, 10 (05): : 1055 - 1064
  • [9] NEAREST NEIGHBOR PATTERN CLASSIFICATION
    COVER, TM
    HART, PE
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) : 21 - +
  • [10] Gene selection for tumor classification using a novel bio-inspired multi-objective approach
    Dashtban, M.
    Balafar, Mohammadali
    Suravajhala, Prashanth
    [J]. GENOMICS, 2018, 110 (01) : 10 - 17