Effects of Random Forest Parameters in the Selection of Biomarkers

被引:2
|
作者
Khaire, Utkarsh Mahadeo [1 ]
Dhanalakshmi, R. [2 ]
机构
[1] Natl Inst Technol Nagaland, Dept Comp Sci & Engn, Chumukedima 797103, India
[2] Indian Inst Informat Technol Tiruchirappalli, Dept Comp Sci & Engn, Tiruchirappalli 620015, Tamil Nadu, India
关键词
microarray; curse of dimensionality; random forest; feature selection; high-dimensional dataset; CANCER; CLASSIFICATION; ALGORITHM; SIGNATURE;
D O I
10.1093/comjnl/bxz161
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
A microarray dataset contains thousands of DNA spots covering almost every gene in the genome. Microarray-based gene expression helps with the diagnosis, prognosis and treatment of cancer. The nature of diseases frequently changes, which in turn generates a considerable volume of data. The main drawback of microarray data is the curse of dimensionality. It hinders useful information and leads to computational instability. The main objective of feature selection is to extract and remove insignificant and irrelevant features to determine the informative genes that cause cancer. Random forest is a well-suited classification algorithm for microarray data. To enhance the importance of the variables, we proposed out-of-bag (OOB) cases in every tree of the forest to count the number of votes for the exact class. The incorporation of random permutation in the variables of these OOB cases enables us to select the crucial features from high-dimensional microarray data. In this study, we analyze the effects of various random forest parameters on the selection procedure. 'Variable drop fraction' regulates the forest construction. The higher variable drop fraction value efficiently decreases the dimensionality of the microarray data. Forest built with 800 trees chooses fewer important features under any variable drop fraction value that reduces microarray data dimensionality.
引用
收藏
页码:1840 / 1847
页数:8
相关论文
共 50 条
  • [21] Evaluation of feature selection methods utilizing random forest and logistic regression for lung tissue categorization using HRCT images
    Vishraj, Rashmi
    Gupta, Savita
    Singh, Sukhwinder
    EXPERT SYSTEMS, 2023, 40 (08)
  • [22] Features Selection in Character Recognition with Random Forest Classifier
    Homenda, Wladyslaw
    Lesinski, Wojciech
    COMPUTATIONAL COLLECTIVE INTELLIGENCE: TECHNOLOGIES AND APPLICATIONS, PT I, 2011, 6922 : 93 - +
  • [23] Random forest for gene selection and microarray data classification
    Moorthy, Kohbalan
    Mohamad, Mohd Saberi
    BIOINFORMATION, 2011, 7 (03) : 142 - 146
  • [24] Krill Herd Optimization Algorithm forCancer Feature Selection and Random Forest Technique for Classification
    Rani, R. Ranjani
    Ramyachitra, D.
    PROCEEDINGS OF 2017 8TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS 2017), 2017, : 109 - 113
  • [25] Extracting croplands in western Inner Mongolia by using random forest and temporal feature selection
    Su, Tengfei
    Zhang, Shengwei
    Tian, Ya'nan
    JOURNAL OF SPATIAL SCIENCE, 2020, 65 (03) : 519 - 537
  • [26] On Dynamic Selection of Subspace for Random Forest
    Adnan, Md Nasim
    ADVANCED DATA MINING AND APPLICATIONS, ADMA 2014, 2014, 8933 : 370 - 379
  • [27] Distance Correlation-Based Feature Selection in Random Forest
    Ratnasingam, Suthakaran
    Munoz-Lopez, Jose
    ENTROPY, 2023, 25 (09)
  • [28] Microgrid fault classification based on random forest feature selection
    Wang, Changhong
    Gao, Yanjie
    Tang, Min
    REVIEWS OF ADHESION AND ADHESIVES, 2023, 11 (02): : 220 - 237
  • [29] A New Noisy Random Forest Based Method for Feature Selection
    Akhiat, Yassine
    Manzali, Youness
    Chahhou, Mohamed
    Zinedine, Ahmed
    CYBERNETICS AND INFORMATION TECHNOLOGIES, 2021, 21 (02) : 10 - 28
  • [30] Robustness of Random Forest-based gene selection methods
    Kursa, Miron Bartosz
    BMC BIOINFORMATICS, 2014, 15