Graph Random Forest: A Graph Embedded Algorithm for Identifying Highly Connected Important Features

被引:11
|
作者
Tian, Leqi [1 ,2 ]
Wu, Wenbin [1 ]
Yu, Tianwei [1 ,2 ,3 ]
机构
[1] Chinese Univ Hong Kong, Sch Data Sci, Shenzhen 518172, Peoples R China
[2] Shenzhen Res Inst Big Data, Shenzhen 518172, Peoples R China
[3] Guangdong Prov Key Lab Big Data Comp, Shenzhen 518172, Peoples R China
关键词
feature selection; random forest; gene network; CANCER;
D O I
10.3390/biom13071153
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Random Forest (RF) is a widely used machine learning method with good performance on classification and regression tasks. It works well under low sample size situations, which benefits applications in the field of biology. For example, gene expression data often involve much larger numbers of features (p) compared to the size of samples (n). Though the predictive accuracy using RF is often high, there are some problems when selecting important genes using RF. The important genes selected by RF are usually scattered on the gene network, which conflicts with the biological assumption of functional consistency between effective features. To improve feature selection by incorporating external topological information between genes, we propose the Graph Random Forest (GRF) for identifying highly connected important features by involving the known biological network when constructing the forest. The algorithm can identify effective features that form highly connected sub-graphs and achieve equivalent classification accuracy to RF. To evaluate the capability of our proposed method, we conducted simulation experiments and applied the method to two real datasets-non-small cell lung cancer RNA-seq data from The Cancer Genome Atlas, and human embryonic stem cell RNA-seq dataset (GSE93593). The resulting high classification accuracy, connectivity of selected sub-graphs, and interpretable feature selection results suggest the method is a helpful addition to graph-based classification models and feature selection procedures.
引用
收藏
页数:14
相关论文
共 45 条
  • [1] The maximum tree of a random forest in the configuration graph
    Pavlov, Yu L.
    SBORNIK MATHEMATICS, 2021, 212 (09) : 1329 - 1346
  • [2] Melanoma important features selection using random forest approach
    Paja, Wieslaw
    Wrzesien, Mariusz
    2013 6TH INTERNATIONAL CONFERENCE ON HUMAN SYSTEM INTERACTIONS (HSI), 2013, : 415 - 418
  • [3] Computational Method for Identifying Malonylation Sites by Using Random Forest Algorithm
    Wang, ShaoPeng
    Li, JiaRui
    Sun, Xijun
    Zhang, Yu-Hang
    Huang, Tao
    Cai, Yu-Dong
    COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, 2020, 23 (04) : 304 - 312
  • [4] Enhancing Basketball Game Outcome Prediction through Fused Graph Convolutional Networks and Random Forest Algorithm
    Zhao, Kai
    Du, Chunjie
    Tan, Guangxin
    ENTROPY, 2023, 25 (05)
  • [5] GAERF: predicting lncRNA-disease associations by graph auto-encoder and random forest
    Wu, Qing-Wen
    Xia, Jun-Feng
    Ni, Jian-Cheng
    Zheng, Chun-Hou
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (05)
  • [6] Combining random forest and graph wavenet for spatial-temporal data prediction
    Chen C.
    Xu Y.
    Zhao J.
    Chen L.
    Xue Y.
    Intelligent and Converged Networks, 2022, 3 (04): : 364 - 377
  • [7] Identifying Forest Fire Driving Factors and Related Impacts in China Using Random Forest Algorithm
    Ma, Wenyuan
    Feng, Zhongke
    Cheng, Zhuxin
    Chen, Shilin
    Wang, Fengge
    FORESTS, 2020, 11 (05):
  • [8] Identifying EGFR mutations in lung adenocarcinoma by noninvasive imaging using radiomics features and random forest modeling
    Jia, Tian-Ying
    Xiong, Jun-Feng
    Li, Xiao-Yang
    Yu, Wen
    Xu, Zhi-Yong
    Cai, Xu-Wei
    Ma, Jing-Chen
    Ren, Ya-Cheng
    Larsson, Rasmus
    Zhang, Jie
    Zhao, Jun
    Fu, Xiao-Long
    EUROPEAN RADIOLOGY, 2019, 29 (09) : 4742 - 4750
  • [9] Multiscale feature extraction from the perspective of graph for hob fault diagnosis using spectral graph wavelet transform combined with improved random forest
    Dong, Xin
    Li, Guolong
    Jia, Yachao
    Xu, Kai
    MEASUREMENT, 2021, 176
  • [10] An Advanced Random Forest Algorithm Targeting the Big Data with Redundant Features
    Zhang, Ying
    Song, Bin
    Zhang, Yue
    Chen, Sijia
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2017, 2017, 10393 : 642 - 651