Effects of Random Forest Parameters in the Selection of Biomarkers

被引:2
|
作者
Khaire, Utkarsh Mahadeo [1 ]
Dhanalakshmi, R. [2 ]
机构
[1] Natl Inst Technol Nagaland, Dept Comp Sci & Engn, Chumukedima 797103, India
[2] Indian Inst Informat Technol Tiruchirappalli, Dept Comp Sci & Engn, Tiruchirappalli 620015, Tamil Nadu, India
关键词
microarray; curse of dimensionality; random forest; feature selection; high-dimensional dataset; CANCER; CLASSIFICATION; ALGORITHM; SIGNATURE;
D O I
10.1093/comjnl/bxz161
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
A microarray dataset contains thousands of DNA spots covering almost every gene in the genome. Microarray-based gene expression helps with the diagnosis, prognosis and treatment of cancer. The nature of diseases frequently changes, which in turn generates a considerable volume of data. The main drawback of microarray data is the curse of dimensionality. It hinders useful information and leads to computational instability. The main objective of feature selection is to extract and remove insignificant and irrelevant features to determine the informative genes that cause cancer. Random forest is a well-suited classification algorithm for microarray data. To enhance the importance of the variables, we proposed out-of-bag (OOB) cases in every tree of the forest to count the number of votes for the exact class. The incorporation of random permutation in the variables of these OOB cases enables us to select the crucial features from high-dimensional microarray data. In this study, we analyze the effects of various random forest parameters on the selection procedure. 'Variable drop fraction' regulates the forest construction. The higher variable drop fraction value efficiently decreases the dimensionality of the microarray data. Forest built with 800 trees chooses fewer important features under any variable drop fraction value that reduces microarray data dimensionality.
引用
收藏
页码:1840 / 1847
页数:8
相关论文
共 50 条
  • [41] A comparison of random forest based algorithms: random credal random forest versus oblique random forest
    Carlos J. Mantas
    Javier G. Castellano
    Serafín Moral-García
    Joaquín Abellán
    Soft Computing, 2019, 23 : 10739 - 10754
  • [42] Feature Selection Based on Random Forest for Partial Discharges Characteristic Set
    Yao, Rui
    Li, Jun
    Hui, Meng
    Bai, Lin
    Wu, Qisheng
    IEEE ACCESS, 2020, 8 : 159151 - 159161
  • [43] Determining threshold value on information gain feature selection to increase speed and prediction accuracy of random forest
    Prasetiyowati, Maria Irmina
    Maulidevi, Nur Ulfa
    Surendro, Kridanto
    JOURNAL OF BIG DATA, 2021, 8 (01)
  • [44] How random is the random forest ? Random forest algorithm on the service of structural imaging biomarkers for Alzheimer's disease: from Alzheimer's disease neuroimaging initiative(ADNI) database
    Stavros I.Dimitriadis
    Dimitris Liparas
    Neural Regeneration Research, 2018, 13 (06) : 962 - 970
  • [45] Feature Selection and Instance Selection from Clinical Datasets Using Co-operative Co-evolution and Classification Using Random Forest
    Christo, V. R. Elgin
    Nehemiah, H. Khanna
    Brighty, J.
    Kannan, Arputharaj
    IETE JOURNAL OF RESEARCH, 2022, 68 (04) : 2508 - 2521
  • [46] Mixed-effects random forest for clustered data
    Hajjem, Ahlem
    Bellavance, Francois
    Larocque, Denis
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2014, 84 (06) : 1313 - 1328
  • [47] Measuring regional effects of model inputs with random Forest
    Song, Jingwen
    Lu, Zhenzhou
    Wei, Pengfei
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2020, 49 (09) : 2444 - 2461
  • [48] Optimal feature selection and crop extraction using random forest based on GF-6 WFV data
    Gao, Yanli
    Zhao, Zhanqing
    Shang, Guofei
    Liu, Yubo
    Liu, Shizhuo
    Yan, Haiming
    Chen, Yanhong
    Zhang, Xia
    Li, Weiguo
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2024, 45 (19-20) : 7395 - 7414
  • [49] Rolling bearing fault feature selection based on standard deviation and random forest classifier using vibration signals
    Imane, Moussaoui
    Rahmoune, Chemseddine
    Benazzouz, Djamel
    ADVANCES IN MECHANICAL ENGINEERING, 2023, 15 (04)
  • [50] Coupling Multivariate Adaptive Regression Spline (MARS) and Random Forest (RF): A Hybrid Feature Selection Method in Action
    Nagpal, Arpita
    Singh, Vijendra
    INTERNATIONAL JOURNAL OF HEALTHCARE INFORMATION SYSTEMS AND INFORMATICS, 2019, 14 (01) : 1 - 18