Identification of influential rare variants in aggregate testing using random forest importance measures

被引:2
|
作者
Blumhagen, Rachel Z. [1 ,2 ]
Schwartz, David A. [3 ]
Langefeld, Carl D. [4 ,5 ,6 ]
Fingerlin, Tasha E. [1 ,2 ,3 ]
机构
[1] Natl Jewish Hlth, Ctr Genes Environm & Hlth, Denver, CO 80206 USA
[2] Colorado Sch Publ Hlth, Dept Biostat & Informat, Aurora, CO USA
[3] Univ Colorado, Sch Med, Aurora, CO USA
[4] Wake Forest Sch Med, Dept Biostat & Data Sci, Winston Salem, NC USA
[5] Wake Forest Baptist Med Ctr, Comprehens Canc Ctr, Winston Salem, NC USA
[6] Wake Forest Sch Med, Ctr Precis Med, Winston Salem, NC USA
关键词
genetic association; idiopathic pulmonary fibrosis; random forest; rare variants; targeted sequencing; TERT PROMOTER MUTATIONS;
D O I
10.1111/ahg.12509
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Aggregate tests of rare variants are often employed to identify associated regions compared to sequentially testing each individual variant. When an aggregate test is significant, it is of interest to identify which rare variants are "driving" the association. We recently developed the rare variant influential filtering tool (RIFT) to identify influential rare variants and showed RIFT had higher true positive rates compared to other published methods. Here we use importance measures from the standard random forest (RF) and variable importance weighted RF (vi-RF) to identify influential variants. For very rare variants (minor allele frequency [MAF] < 0.001), the vi-RF:Accuracy method had the highest median true positive rate (TPR = 0.24; interquartile range [IQR]: 0.13, 0.42) followed by the RF:Accuracy method (TPR = 0.16; IQR: 0.07, 0.33) and both were superior to RIFT (TPR = 0.05; IQR: 0.02, 0.15). Among uncommon variants (0.001 < MAF < 0.03), the RF methods had higher true positive rates than RIFT while observing comparable false positive rates. Finally, we applied the RF methods to a targeted resequencing study in idiopathic pulmonary fibrosis (IPF), in which the vi-RF approach identified eight and seven variants in TERT and FAM13A, respectively. In summary, the vi-RF provides an improved, objective approach to identifying influential variants following a significant aggregate test. We have expanded our previously developed R package RIFT to include the random forest methods.
引用
收藏
页码:184 / 195
页数:12
相关论文
共 50 条
  • [31] Improvised number identification using SVM and random forest classifiers
    Upadhyay, Anand
    Singh, Mahipal
    Yadav, Vivek Kumar
    JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2020, 41 (02): : 387 - 394
  • [32] Using Random Forest Classifier for Particle Identification in the ALICE Experiment
    Trzcinski, Tomasz
    Graczykowski, Lukasz
    Glinka, Michal
    INFORMATION TECHNOLOGY, SYSTEMS RESEARCH, AND COMPUTATIONAL PHYSICS, 2020, 945 : 3 - 17
  • [33] OIL SPILLS AND EMULSIONS IDENTIFICATION USING RANDOM FOREST MODEL
    Zhang, Ning
    Yang, Junfang
    Zhang, Jie
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 7594 - 7597
  • [34] Splice site identification in human genome using random forest
    Pashaei E.
    Ozen M.
    Aydin N.
    Health and Technology, 2017, 7 (1) : 141 - 152
  • [35] Lost in a random forest: Using Big Data to study rare events
    Bail, Christopher A.
    BIG DATA & SOCIETY, 2015, 2 (02):
  • [36] Identification of the most influential areas for air pollution control using XGBoost and Grid Importance Rank
    Ma, Jun
    Cheng, Jack C. P.
    Xu, Zherui
    Chen, Keyu
    Lin, Changqing
    Jiang, Feifeng
    JOURNAL OF CLEANER PRODUCTION, 2020, 274 (274)
  • [37] Identification of rare thalassemia variants using third-generation sequencing
    Liu, Qin
    Chen, Qianting
    Zhang, Zonglei
    Peng, Shiyi
    Liu, Jing
    Pang, Jialun
    Jia, Zhengjun
    Xi, Hui
    Li, Jiaqi
    Chen, Libao
    Liu, Yinyin
    Peng, Ying
    FRONTIERS IN GENETICS, 2023, 13
  • [38] Identification of genetic association of multiple rare variants using collapsing methods
    Sun, Yan V. d
    Sung, Yun Ju
    Tintle, Nathan
    Ziegler, Andreas
    GENETIC EPIDEMIOLOGY, 2011, 35 : S101 - S106
  • [39] IDENTIFICATION OF RARE IMMUNOGLOBULIN SWITCH VARIANTS USING THE ELISA SPOT ASSAY
    SPIRA, G
    SCHARFF, MD
    JOURNAL OF IMMUNOLOGICAL METHODS, 1992, 148 (1-2) : 121 - 129
  • [40] Critique of operating variables importance on chiller energy performance using random forest
    Yu, F. W.
    Ho, W. T.
    Chan, K. T.
    Sit, R. K. Y.
    ENERGY AND BUILDINGS, 2017, 139 : 653 - 664