Identification of influential rare variants in aggregate testing using random forest importance measures

被引:2
|
作者
Blumhagen, Rachel Z. [1 ,2 ]
Schwartz, David A. [3 ]
Langefeld, Carl D. [4 ,5 ,6 ]
Fingerlin, Tasha E. [1 ,2 ,3 ]
机构
[1] Natl Jewish Hlth, Ctr Genes Environm & Hlth, Denver, CO 80206 USA
[2] Colorado Sch Publ Hlth, Dept Biostat & Informat, Aurora, CO USA
[3] Univ Colorado, Sch Med, Aurora, CO USA
[4] Wake Forest Sch Med, Dept Biostat & Data Sci, Winston Salem, NC USA
[5] Wake Forest Baptist Med Ctr, Comprehens Canc Ctr, Winston Salem, NC USA
[6] Wake Forest Sch Med, Ctr Precis Med, Winston Salem, NC USA
关键词
genetic association; idiopathic pulmonary fibrosis; random forest; rare variants; targeted sequencing; TERT PROMOTER MUTATIONS;
D O I
10.1111/ahg.12509
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Aggregate tests of rare variants are often employed to identify associated regions compared to sequentially testing each individual variant. When an aggregate test is significant, it is of interest to identify which rare variants are "driving" the association. We recently developed the rare variant influential filtering tool (RIFT) to identify influential rare variants and showed RIFT had higher true positive rates compared to other published methods. Here we use importance measures from the standard random forest (RF) and variable importance weighted RF (vi-RF) to identify influential variants. For very rare variants (minor allele frequency [MAF] < 0.001), the vi-RF:Accuracy method had the highest median true positive rate (TPR = 0.24; interquartile range [IQR]: 0.13, 0.42) followed by the RF:Accuracy method (TPR = 0.16; IQR: 0.07, 0.33) and both were superior to RIFT (TPR = 0.05; IQR: 0.02, 0.15). Among uncommon variants (0.001 < MAF < 0.03), the RF methods had higher true positive rates than RIFT while observing comparable false positive rates. Finally, we applied the RF methods to a targeted resequencing study in idiopathic pulmonary fibrosis (IPF), in which the vi-RF approach identified eight and seven variants in TERT and FAM13A, respectively. In summary, the vi-RF provides an improved, objective approach to identifying influential variants following a significant aggregate test. We have expanded our previously developed R package RIFT to include the random forest methods.
引用
收藏
页码:184 / 195
页数:12
相关论文
共 50 条
  • [1] Identification of Influential Variants in Significant Aggregate Rare Variant Tests
    Blumhagen, Rachel Z.
    Schwartz, David A.
    Langefeld, Carl D.
    Fingerlin, Tasha E.
    HUMAN HEREDITY, 2021, 85 (01) : 11 - 23
  • [2] Empirical characterization of random forest variable importance measures
    Archer, Kelfie J.
    Kirnes, Ryan V.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2008, 52 (04) : 2249 - 2260
  • [3] An Adaptive Model for Identification of Influential Bloggers Based on Case-Based Reasoning Using Random Forest
    Asim, Yousra
    Raza, Basit
    Malik, Ahmad Kamran
    Shahaid, Ahmad R.
    Alquhayz, Hani
    IEEE ACCESS, 2019, 7 : 87732 - 87749
  • [4] Identification of Influential Weather Factors on Traffic Safety Using K-means Clustering and Random Forest
    Kwon, Oh Hoon
    Park, Shin Hyoung
    ADVANCED MULTIMEDIA AND UBIQUITOUS ENGINEERING: FUTURETECH & MUE, 2016, 393 : 593 - 599
  • [5] Identification of response regulation governing ozone formation based on influential factors using a random forest approach
    Huang, Yan
    Wang, Qingqing
    Ou, Xiaojie
    Sheng, Dongping
    Yao, Shengdong
    Wu, Chengzhi
    Wang, Qiaoli
    HELIYON, 2024, 10 (16)
  • [6] Identification of EMCI in MR Brainstem Structure Using Fractal Measures and Random Forest Approach
    Palanisamy, Rohini
    Swaminathan, Ramakrishnan
    DIGITAL PERSONALIZED HEALTH AND MEDICINE, 2020, 270 : 1309 - 1310
  • [7] An experimental study of the intrinsic stability of random forest variable importance measures
    Huazhen Wang
    Fan Yang
    Zhiyuan Luo
    BMC Bioinformatics, 17
  • [8] An experimental study of the intrinsic stability of random forest variable importance measures
    Wang, Huazhen
    Yang, Fan
    Luo, Zhiyuan
    BMC BIOINFORMATICS, 2016, 17
  • [9] Bias in random forest variable importance measures: Illustrations, sources and a solution
    Carolin Strobl
    Anne-Laure Boulesteix
    Achim Zeileis
    Torsten Hothorn
    BMC Bioinformatics, 8
  • [10] Bias in random forest variable importance measures: Illustrations, sources and a solution
    Strobl, Carolin
    Boulesteix, Anne-Laure
    Zeileis, Achim
    Hothorn, Torsten
    BMC BIOINFORMATICS, 2007, 8 (1)