A review of gene selection methods based on machine learning approaches

被引:0
作者
Lee, Hajoung [1 ,2 ]
Kim, Jaejik [1 ]
机构
[1] Sungkyunkwan Univ, Dept Stat, Seoul, South Korea
[2] Sungkyunkwan Univ, Dept Stat, 25-2 Sungkyunkwan ro, Seoul 03063, South Korea
基金
新加坡国家研究基金会;
关键词
gene selection; gene expression data; supervised learning; unsupervised learning; SUPERVISED FEATURE-SELECTION; SUPPORT VECTOR MACHINE; EXPRESSION DATA; CANCER CLASSIFICATION; MICROARRAY DATA; VARIABLE SELECTION; MUTUAL INFORMATION; FEATURE-EXTRACTION; FILTER; ALGORITHM;
D O I
10.5351/KJAS.2022.35.5.667
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Gene expression data present the level of mRNA abundance of each gene, and analyses of gene expressions have provided key ideas for understanding the mechanism of diseases and developing new drugs and therapies. Nowadays high-throughput technologies such as DNA microarray and RNA-sequencing enabled the simultane-ous measurement of thousands of gene expressions, giving rise to a characteristic of gene expression data known as high dimensionality. Due to the high-dimensionality, learning models to analyze gene expression data are prone to overfitting problems, and to solve this issue, dimension reduction or feature selection techniques are commonly used as a preprocessing step. In particular, we can remove irrelevant and redundant genes and identify important genes using gene selection methods in the preprocessing step. Various gene selection methods have been developed in the context of machine learning so far. In this paper, we intensively review recent works on gene selection methods using machine learning approaches. In addition, the underlying difficulties with current gene selection methods as well as future research directions are discussed.
引用
收藏
页码:667 / 684
页数:18
相关论文
共 105 条
  • [1] Evaluation of Wrapper-based Feature Selection using Hard, Moderate, and Easy Bioinformatics Data
    Abu Shanab, Ahmad
    Khoshtoftaar, Taghi M.
    Wald, Randall
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE), 2014, : 149 - 155
  • [2] A Survey on Hybrid Feature Selection Methods in Microarray Gene Expression Data for Cancer Classification
    Almugren, Nada
    Alshamlan, Hala
    [J]. IEEE ACCESS, 2019, 7 : 78533 - 78548
  • [3] A balanced iterative random forest for gene selection from microarray data
    Anaissi, Ali
    Kennedy, Paul J.
    Goyal, Madhu
    Catchpoole, Daniel R.
    [J]. BMC BIOINFORMATICS, 2013, 14
  • [4] Supervised, Unsupervised, and Semi-Supervised Feature Selection: A Review on Gene Selection
    Ang, Jun Chin
    Mirzal, Andri
    Haron, Habibollah
    Hamed, Haza Nuzly Abdull
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2016, 13 (05) : 971 - 989
  • [5] Semi-supervised SVM-based Feature Selection for Cancer Classification using Microarray Gene Expression Data
    Ang, Jun Chin
    Haron, Habibollah
    Hamed, Haza Nuzly Abdull
    [J]. CURRENT APPROACHES IN APPLIED ARTIFICIAL INTELLIGENCE, 2015, 9101 : 468 - 477
  • [6] [Anonymous], 2008, 14 ACM SGKDD INT C K
  • [7] [Anonymous], 2016, ESANN
  • [8] [Anonymous], 2012, Turing-100
  • [9] A Comparative Performance Evaluation of Supervised Feature Selection Algorithms on Microarray Datasets
    ArunKumar, C.
    Sooraj, M. P.
    Ramakrishnan, S.
    [J]. 7TH INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING & COMMUNICATIONS (ICACC-2017), 2017, 115 : 209 - 217
  • [10] Awada W, 2012, 2012 IEEE 13TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), P356, DOI 10.1109/IRI.2012.6303031