Identification of influential rare variants in aggregate testing using random forest importance measures

被引:2
|
作者
Blumhagen, Rachel Z. [1 ,2 ]
Schwartz, David A. [3 ]
Langefeld, Carl D. [4 ,5 ,6 ]
Fingerlin, Tasha E. [1 ,2 ,3 ]
机构
[1] Natl Jewish Hlth, Ctr Genes Environm & Hlth, Denver, CO 80206 USA
[2] Colorado Sch Publ Hlth, Dept Biostat & Informat, Aurora, CO USA
[3] Univ Colorado, Sch Med, Aurora, CO USA
[4] Wake Forest Sch Med, Dept Biostat & Data Sci, Winston Salem, NC USA
[5] Wake Forest Baptist Med Ctr, Comprehens Canc Ctr, Winston Salem, NC USA
[6] Wake Forest Sch Med, Ctr Precis Med, Winston Salem, NC USA
关键词
genetic association; idiopathic pulmonary fibrosis; random forest; rare variants; targeted sequencing; TERT PROMOTER MUTATIONS;
D O I
10.1111/ahg.12509
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Aggregate tests of rare variants are often employed to identify associated regions compared to sequentially testing each individual variant. When an aggregate test is significant, it is of interest to identify which rare variants are "driving" the association. We recently developed the rare variant influential filtering tool (RIFT) to identify influential rare variants and showed RIFT had higher true positive rates compared to other published methods. Here we use importance measures from the standard random forest (RF) and variable importance weighted RF (vi-RF) to identify influential variants. For very rare variants (minor allele frequency [MAF] < 0.001), the vi-RF:Accuracy method had the highest median true positive rate (TPR = 0.24; interquartile range [IQR]: 0.13, 0.42) followed by the RF:Accuracy method (TPR = 0.16; IQR: 0.07, 0.33) and both were superior to RIFT (TPR = 0.05; IQR: 0.02, 0.15). Among uncommon variants (0.001 < MAF < 0.03), the RF methods had higher true positive rates than RIFT while observing comparable false positive rates. Finally, we applied the RF methods to a targeted resequencing study in idiopathic pulmonary fibrosis (IPF), in which the vi-RF approach identified eight and seven variants in TERT and FAM13A, respectively. In summary, the vi-RF provides an improved, objective approach to identifying influential variants following a significant aggregate test. We have expanded our previously developed R package RIFT to include the random forest methods.
引用
收藏
页码:184 / 195
页数:12
相关论文
共 50 条
  • [21] ISTRF: Identification of sucrose transporter using random forest
    Chen, Dong
    Li, Sai
    Chen, Yu
    FRONTIERS IN GENETICS, 2022, 13
  • [22] The behaviour of random forest permutation-based variable importance measures under predictor correlation
    Kristin K Nicodemus
    James D Malley
    Carolin Strobl
    Andreas Ziegler
    BMC Bioinformatics, 11
  • [23] The behaviour of random forest permutation-based variable importance measures under predictor correlation
    Nicodemus, Kristin K.
    Malley, James D.
    Strobl, Carolin
    Ziegler, Andreas
    BMC BIOINFORMATICS, 2010, 11
  • [24] Random Forest Variable Importance Measures for Spatial Dynamics: Case Studies from Urban Demography
    Georgati, Marina
    Hansen, Henning Sten
    Kessler, Carsten
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2023, 12 (11)
  • [25] Assessing influential rainfall-runoff variables to simulate daily streamflow using random forest
    Vilaseca, Federico
    Castro, Alberto
    Chreties, Christian
    Gorgoglione, Angela
    HYDROLOGICAL SCIENCES JOURNAL, 2023, 68 (12) : 1738 - 1753
  • [26] Identification of functional rare variants in genome-wide association studies using stability selection based on random collapsing
    Xin Huang
    Yixin Fang
    Junhui Wang
    BMC Proceedings, 5 (Suppl 9)
  • [27] Hyperspectral Identification of Ginseng Growth Years and Spectral Importance Analysis Based on Random Forest
    Zhao, Limin
    Liu, Shumin
    Chen, Xingfeng
    Wu, Zengwei
    Yang, Rui
    Shi, Tingting
    Zhang, Yunli
    Zhou, Kaiwen
    Li, Jiaguo
    APPLIED SCIENCES-BASEL, 2022, 12 (12):
  • [28] Investigating macro-level hotzone identification and variable importance using big data: A random forest models approach
    Jiang, Ximiao
    Abdel-Aty, Mohamed
    Hu, Jia
    Lee, Jaeyoung
    NEUROCOMPUTING, 2016, 181 : 53 - 63
  • [29] Identification of Nine mRNA Signatures for Sepsis Using Random Forest
    Zhou, Jing
    Dong, Siqing
    Wang, Ping
    Su, Xi
    Cheng, Liang
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2022, 2022
  • [30] Using random forest for brain tissue identification by Raman spectroscopy
    Zhang, Weiyi
    Giang, Chau Minh
    Cai, Qingan
    Badie, Behnam
    Sheng, Jun
    Li, Chen
    MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2023, 4 (04):