Effects of Random Forest Parameters in the Selection of Biomarkers

被引:2
|
作者
Khaire, Utkarsh Mahadeo [1 ]
Dhanalakshmi, R. [2 ]
机构
[1] Natl Inst Technol Nagaland, Dept Comp Sci & Engn, Chumukedima 797103, India
[2] Indian Inst Informat Technol Tiruchirappalli, Dept Comp Sci & Engn, Tiruchirappalli 620015, Tamil Nadu, India
关键词
microarray; curse of dimensionality; random forest; feature selection; high-dimensional dataset; CANCER; CLASSIFICATION; ALGORITHM; SIGNATURE;
D O I
10.1093/comjnl/bxz161
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
A microarray dataset contains thousands of DNA spots covering almost every gene in the genome. Microarray-based gene expression helps with the diagnosis, prognosis and treatment of cancer. The nature of diseases frequently changes, which in turn generates a considerable volume of data. The main drawback of microarray data is the curse of dimensionality. It hinders useful information and leads to computational instability. The main objective of feature selection is to extract and remove insignificant and irrelevant features to determine the informative genes that cause cancer. Random forest is a well-suited classification algorithm for microarray data. To enhance the importance of the variables, we proposed out-of-bag (OOB) cases in every tree of the forest to count the number of votes for the exact class. The incorporation of random permutation in the variables of these OOB cases enables us to select the crucial features from high-dimensional microarray data. In this study, we analyze the effects of various random forest parameters on the selection procedure. 'Variable drop fraction' regulates the forest construction. The higher variable drop fraction value efficiently decreases the dimensionality of the microarray data. Forest built with 800 trees chooses fewer important features under any variable drop fraction value that reduces microarray data dimensionality.
引用
收藏
页码:1840 / 1847
页数:8
相关论文
共 50 条
  • [31] Mouse Trajectories and State Anxiety: Feature Selection with Random Forest
    Yamauchi, Takashi
    2013 HUMAINE ASSOCIATION CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2013, : 399 - 404
  • [32] A review of random forest-based feature selection methods for data science education and applications
    Iranzad, Reza
    Liu, Xiao
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024,
  • [33] Feature Selection of Power Quality Disturbance Signals with an Entropy-Importance-Based Random Forest
    Huang, Nantian
    Lu, Guobo
    Cai, Guowei
    Xu, Dianguo
    Xu, Jiafeng
    Li, Fuqing
    Zhang, Liying
    ENTROPY, 2016, 18 (02)
  • [34] Feature Selection using Particle Swarm Optimization and Random Forest for Hepatocellular Carcinoma (HCC) Classification
    Maulidina, Faisa
    Rustam, Zuherman
    Novita, Mila
    Setiawan, Qisthina Syifa
    Sagiran
    2021 INTERNATIONAL CONFERENCE ON DECISION AID SCIENCES AND APPLICATION (DASA), 2021,
  • [35] RANDOM FOREST AND SUPPORT VECTOR MACHINE ON FEATURES SELECTION FOR REGRESSION ANALYSIS
    Dewi, Christine
    Chen, Rung-Ching
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2019, 15 (06): : 2027 - 2037
  • [36] Random Forest (RF) Wrappers for Waveband Selection and Classification of Hyperspectral Data
    Poona, Nitesh Keshavelal
    van Niekerk, Adriaan
    Nadel, Ryan Leslie
    Ismail, Riyad
    APPLIED SPECTROSCOPY, 2016, 70 (02) : 322 - 333
  • [37] Optimizing the Performance of the IDS through Feature-Relevant Selection Using PSO and Random Forest Techniques
    Safa, Benaissa
    Mohamed-Hamou, Reda
    Toumouh, Adil
    COMPUTACION Y SISTEMAS, 2024, 28 (02): : 473 - 488
  • [38] A comparison of random forest variable selection methods for classification prediction modeling
    Speiser, Jaime Lynn
    Miller, Michael E.
    Tooze, Janet
    Ip, Edward
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 134 : 93 - 101
  • [39] Predictor Selection and Attack Classification using Random Forest for Intrusion Detection
    Ambikavathi, Chandramohan
    Srivatsa, Srinivasa Krishna
    JOURNAL OF SCIENTIFIC & INDUSTRIAL RESEARCH, 2020, 79 (05): : 365 - 368
  • [40] A comparison of random forest based algorithms: random credal random forest versus oblique random forest
    Mantas, Carlos J.
    Castellano, Javier G.
    Moral-Garcia, Serafin
    Abellan, Joaquin
    SOFT COMPUTING, 2019, 23 (21) : 10739 - 10754