Effects of Random Forest Parameters in the Selection of Biomarkers

被引:2
|
作者
Khaire, Utkarsh Mahadeo [1 ]
Dhanalakshmi, R. [2 ]
机构
[1] Natl Inst Technol Nagaland, Dept Comp Sci & Engn, Chumukedima 797103, India
[2] Indian Inst Informat Technol Tiruchirappalli, Dept Comp Sci & Engn, Tiruchirappalli 620015, Tamil Nadu, India
关键词
microarray; curse of dimensionality; random forest; feature selection; high-dimensional dataset; CANCER; CLASSIFICATION; ALGORITHM; SIGNATURE;
D O I
10.1093/comjnl/bxz161
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
A microarray dataset contains thousands of DNA spots covering almost every gene in the genome. Microarray-based gene expression helps with the diagnosis, prognosis and treatment of cancer. The nature of diseases frequently changes, which in turn generates a considerable volume of data. The main drawback of microarray data is the curse of dimensionality. It hinders useful information and leads to computational instability. The main objective of feature selection is to extract and remove insignificant and irrelevant features to determine the informative genes that cause cancer. Random forest is a well-suited classification algorithm for microarray data. To enhance the importance of the variables, we proposed out-of-bag (OOB) cases in every tree of the forest to count the number of votes for the exact class. The incorporation of random permutation in the variables of these OOB cases enables us to select the crucial features from high-dimensional microarray data. In this study, we analyze the effects of various random forest parameters on the selection procedure. 'Variable drop fraction' regulates the forest construction. The higher variable drop fraction value efficiently decreases the dimensionality of the microarray data. Forest built with 800 trees chooses fewer important features under any variable drop fraction value that reduces microarray data dimensionality.
引用
收藏
页码:1840 / 1847
页数:8
相关论文
共 50 条
  • [1] Gene selection with guided regularized random forest
    Deng, Houtao
    Runger, George
    PATTERN RECOGNITION, 2013, 46 (12) : 3483 - 3489
  • [2] Research on Feature Selection Methods based on Random Forest
    Wang, Zhuo
    TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2023, 30 (02): : 623 - 633
  • [3] Random Forest for Gene Selection and Microarray Data Classification
    Moorthy, Kohbalan
    Mohamad, Mohd Saberi
    KNOWLEDGE TECHNOLOGY, 2012, 295 : 174 - 183
  • [4] Random forest feature selection for partial label learning
    Sun, Xianran
    Chai, Jing
    NEUROCOMPUTING, 2023, 561
  • [5] Feature selection and classification of leukocytes using random forest
    Saraswat, Mukesh
    Arya, K. V.
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2014, 52 (12) : 1041 - 1052
  • [6] Effects of Dynamic Subspacing in Random Forest
    Adnan, Md Nasim
    Islam, Md Zahidul
    ADVANCED DATA MINING AND APPLICATIONS, ADMA 2017, 2017, 10604 : 303 - 312
  • [7] Prediction with Random Forest Involving Sampling and Feature Selection Strategies
    Cao, Min
    Zhang, Xiaolong
    Li, Bo
    Zhao, Jiafu
    PROCEEDINGS OF THE 2018 13TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2018), 2018, : 600 - 605
  • [8] Empirical Evaluation of the Performance of Feature Selection Approaches on Random Forest
    Kumar, Smitha S.
    Shaikh, Talal
    2017 INTERNATIONAL CONFERENCE ON COMPUTER AND APPLICATIONS (ICCA), 2017, : 227 - 231
  • [9] Melanoma important features selection using random forest approach
    Paja, Wieslaw
    Wrzesien, Mariusz
    2013 6TH INTERNATIONAL CONFERENCE ON HUMAN SYSTEM INTERACTIONS (HSI), 2013, : 415 - 418
  • [10] Application of Random Forest Algorithm on Feature Subset Selection and Classification and Regression
    Jaiswal, Jitendra Kumar
    Samikannu, Rita
    2017 2ND WORLD CONGRESS ON COMPUTING AND COMMUNICATION TECHNOLOGIES (WCCCT), 2017, : 65 - 68