Graph Random Forest: A Graph Embedded Algorithm for Identifying Highly Connected Important Features

被引:11
|
作者
Tian, Leqi [1 ,2 ]
Wu, Wenbin [1 ]
Yu, Tianwei [1 ,2 ,3 ]
机构
[1] Chinese Univ Hong Kong, Sch Data Sci, Shenzhen 518172, Peoples R China
[2] Shenzhen Res Inst Big Data, Shenzhen 518172, Peoples R China
[3] Guangdong Prov Key Lab Big Data Comp, Shenzhen 518172, Peoples R China
关键词
feature selection; random forest; gene network; CANCER;
D O I
10.3390/biom13071153
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Random Forest (RF) is a widely used machine learning method with good performance on classification and regression tasks. It works well under low sample size situations, which benefits applications in the field of biology. For example, gene expression data often involve much larger numbers of features (p) compared to the size of samples (n). Though the predictive accuracy using RF is often high, there are some problems when selecting important genes using RF. The important genes selected by RF are usually scattered on the gene network, which conflicts with the biological assumption of functional consistency between effective features. To improve feature selection by incorporating external topological information between genes, we propose the Graph Random Forest (GRF) for identifying highly connected important features by involving the known biological network when constructing the forest. The algorithm can identify effective features that form highly connected sub-graphs and achieve equivalent classification accuracy to RF. To evaluate the capability of our proposed method, we conducted simulation experiments and applied the method to two real datasets-non-small cell lung cancer RNA-seq data from The Cancer Genome Atlas, and human embryonic stem cell RNA-seq dataset (GSE93593). The resulting high classification accuracy, connectivity of selected sub-graphs, and interpretable feature selection results suggest the method is a helpful addition to graph-based classification models and feature selection procedures.
引用
收藏
页数:14
相关论文
共 45 条
  • [31] To identify important MRI features to differentiate hepatic mucinous cystic neoplasms from septated hepatic cysts based on random forest
    Xiao, Si-Yu
    Xu, Jian-Xia
    Shao, Yi-Huan
    Yu, Ri-Sheng
    JAPANESE JOURNAL OF RADIOLOGY, 2024, 42 (08) : 880 - 891
  • [32] A framework combining acoustic features extraction method and random forest algorithm for gas pipeline leak detection and classification
    Ning, Fangli
    Cheng, Zhanghong
    Meng, Di
    Wei, Juan
    APPLIED ACOUSTICS, 2021, 182
  • [33] Forest Type Classification Based on Integrated Spectral-Spatial-Temporal Features and Random Forest Algorithm-A Case Study in the Qinling Mountains
    Cheng, Kai
    Wang, Juanle
    FORESTS, 2019, 10 (07):
  • [34] Random Forest Algorithm Improves Detection of Physiological Activity Embedded within Reflectance Spectra Using Stomatal Conductance as a Test Case
    Vitrack-Tamam, Snir
    Holtzman, Lilach
    Dagan, Reut
    Levi, Shai
    Tadmor, Yuval
    Azizi, Tamir
    Rabinovitz, Onn
    Naor, Amos
    Liran, Oded
    REMOTE SENSING, 2020, 12 (14)
  • [35] Protein Fold Prediction for Protein Sequences of Low Identity Based on Evolutionary and Spatial Features Using Random Forest Algorithm
    Mehta, Apurva
    Mazumdar, Himanshu
    BIOINTERFACE RESEARCH IN APPLIED CHEMISTRY, 2020, 10 (05): : 6306 - 6316
  • [36] Systematic analysis revealed better performance of random forest algorithm coupled with complex network features in predicting microRNA precursors
    Tang, Xiaojing
    Xiao, Jiamin
    Li, Yizhou
    Wen, Zhining
    Fang, Zheng
    Li, Menglong
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2012, 118 : 317 - 323
  • [37] Screening for moderate to severe obstructive sleep apnea by using heart rate variability features based on random forest algorithm
    Zhang, Chenxu
    Yu, Liangcai
    Li, Lin
    Zeng, Ping
    Zhang, Xiaoqing
    SLEEP AND BREATHING, 2024, 28 (06) : 2521 - 2530
  • [38] Exploratory Data Analysis To Identify The Most Important Feature Of University Admission Test Criteria Using Random Forest And Neural Network Algorithm
    Gufroni, Acep Irham
    Purwanto, Purwanto
    Farikhin, Farikhin
    Wibowo, Adi
    Warsito, Budi
    2021 5TH INTERNATIONAL CONFERENCE ON INFORMATICS AND COMPUTATIONAL SCIENCES (ICICOS 2021), 2021,
  • [39] Predicting structural class for protein sequences of 40% identity based on features of primary and secondary structure using Random Forest algorithm
    Mehta, Apurva
    Himanshu, Mazumdar
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2020, 84
  • [40] Identifying different types of urban land use dynamics using Point-of-interest (POI) and Random Forest algorithm: The case of Huizhou, China
    Wu, Rong
    Wang, Jieyu
    Zhang, Dachuan
    Wang, Shaojian
    CITIES, 2021, 114