Background: Genome-wide association studies for complex diseases will produce genotypes on hundreds of thousands of single nucleotide polymorphisms ( SNPs). A logical first approach to dealing with massive numbers of SNPs is to use some test to screen the SNPs, retaining only those that meet some criterion for futher study. For example, SNPs can be ranked by p-value, and those with the lowest p-values retained. When SNPs have large interaction effects but small marginal effects in a population, they are unlikely to be retained when univariate tests are used for screening. However, model-based screens that pre-specify interactions are impractical for data sets with thousands of SNPs. Random forest analysis is an alternative method that produces a single measure of importance for each predictor variable that takes into account interactions among variables without requiring model specification. Interactions increase the importance for the individual interacting variables, making them more likely to be given high importance relative to other variables. We test the performance of random forests as a screening procedure to identify small numbers of risk-associated SNPs from among large numbers of unassociated SNPs using complex disease models with up to 32 loci, incorporating both genetic heterogeneity and multi-locus interaction. Results: Keeping other factors constant, if risk SNPs interact, the random forest importance measure significantly outperforms the Fisher Exact test as a screening tool. As the number of interacting SNPs increases, the improvement in performance of random forest analysis relative to Fisher Exact test for screening also increases. Random forests perform similarly to the univariate Fisher Exact test as a screening tool when SNPs in the analysis do not interact. Conclusions: In the context of large-scale genetic association studies where unknown interactions exist among true risk-associated SNPs or SNPs and environmental covariates, screening SNPs using random forest analyses can significantly reduce the number of SNPs that need to be retained for further study compared to standard univariate screening methods.
机构:
Walter & Eliza Hall Inst Med Res, Div Mol Med, Melbourne, Vic 3050, Australia
Univ Melbourne, Dept Med Biol, Melbourne, Vic 3050, AustraliaQueensland Inst Med Res, Brisbane, Qld 4006, Australia
Blewitt, Marnie E.
Gendrel, Anne-Valerie
论文数: 0引用数: 0
h-index: 0
机构:
Univ London Imperial Coll Sci Technol & Med, MRC, Ctr Clin Sci, London SW7 2AZ, England
Univ London Imperial Coll Sci Technol & Med, MRC, Fac Med, London SW7 2AZ, EnglandQueensland Inst Med Res, Brisbane, Qld 4006, Australia
Gendrel, Anne-Valerie
Pang, Zhenyi
论文数: 0引用数: 0
h-index: 0
机构:
Queensland Inst Med Res, Brisbane, Qld 4006, AustraliaQueensland Inst Med Res, Brisbane, Qld 4006, Australia
Pang, Zhenyi
Sparrow, Duncan B.
论文数: 0引用数: 0
h-index: 0
机构:
Univ New S Wales, Victor Chang Cardiac Res Inst, Sydney, NSW 2010, AustraliaQueensland Inst Med Res, Brisbane, Qld 4006, Australia
Sparrow, Duncan B.
Whitelaw, Nadia
论文数: 0引用数: 0
h-index: 0
机构:
Queensland Inst Med Res, Brisbane, Qld 4006, Australia
Univ Queensland, Sch Med, Brisbane, Qld 4072, AustraliaQueensland Inst Med Res, Brisbane, Qld 4006, Australia
Whitelaw, Nadia
Craig, Jeffrey M.
论文数: 0引用数: 0
h-index: 0
机构:
Murdoch Childrens Res Inst, Parkville, Vic 3052, AustraliaQueensland Inst Med Res, Brisbane, Qld 4006, Australia
Craig, Jeffrey M.
Apedaile, Anwyn
论文数: 0引用数: 0
h-index: 0
机构:
Univ London Imperial Coll Sci Technol & Med, MRC, Ctr Clin Sci, London SW7 2AZ, England
Univ London Imperial Coll Sci Technol & Med, MRC, Fac Med, London SW7 2AZ, EnglandQueensland Inst Med Res, Brisbane, Qld 4006, Australia
Apedaile, Anwyn
Hilton, Douglas J.
论文数: 0引用数: 0
h-index: 0
机构:
Walter & Eliza Hall Inst Med Res, Div Mol Med, Melbourne, Vic 3050, Australia
Univ Melbourne, Dept Med Biol, Melbourne, Vic 3050, AustraliaQueensland Inst Med Res, Brisbane, Qld 4006, Australia
Hilton, Douglas J.
Dunwoodie, Sally L.
论文数: 0引用数: 0
h-index: 0
机构:Queensland Inst Med Res, Brisbane, Qld 4006, Australia
Dunwoodie, Sally L.
Brockdorff, Neil
论文数: 0引用数: 0
h-index: 0
机构:
Univ London Imperial Coll Sci Technol & Med, MRC, Ctr Clin Sci, London SW7 2AZ, England
Univ London Imperial Coll Sci Technol & Med, MRC, Fac Med, London SW7 2AZ, EnglandQueensland Inst Med Res, Brisbane, Qld 4006, Australia
Brockdorff, Neil
Kay, Graham F.
论文数: 0引用数: 0
h-index: 0
机构:
Queensland Inst Med Res, Brisbane, Qld 4006, AustraliaQueensland Inst Med Res, Brisbane, Qld 4006, Australia
Kay, Graham F.
Whitelaw, Emma
论文数: 0引用数: 0
h-index: 0
机构:
Queensland Inst Med Res, Brisbane, Qld 4006, AustraliaQueensland Inst Med Res, Brisbane, Qld 4006, Australia
机构:
Walter & Eliza Hall Inst Med Res, Div Mol Med, Melbourne, Vic 3050, Australia
Univ Melbourne, Dept Med Biol, Melbourne, Vic 3050, AustraliaQueensland Inst Med Res, Brisbane, Qld 4006, Australia
Blewitt, Marnie E.
Gendrel, Anne-Valerie
论文数: 0引用数: 0
h-index: 0
机构:
Univ London Imperial Coll Sci Technol & Med, MRC, Ctr Clin Sci, London SW7 2AZ, England
Univ London Imperial Coll Sci Technol & Med, MRC, Fac Med, London SW7 2AZ, EnglandQueensland Inst Med Res, Brisbane, Qld 4006, Australia
Gendrel, Anne-Valerie
Pang, Zhenyi
论文数: 0引用数: 0
h-index: 0
机构:
Queensland Inst Med Res, Brisbane, Qld 4006, AustraliaQueensland Inst Med Res, Brisbane, Qld 4006, Australia
Pang, Zhenyi
Sparrow, Duncan B.
论文数: 0引用数: 0
h-index: 0
机构:
Univ New S Wales, Victor Chang Cardiac Res Inst, Sydney, NSW 2010, AustraliaQueensland Inst Med Res, Brisbane, Qld 4006, Australia
Sparrow, Duncan B.
Whitelaw, Nadia
论文数: 0引用数: 0
h-index: 0
机构:
Queensland Inst Med Res, Brisbane, Qld 4006, Australia
Univ Queensland, Sch Med, Brisbane, Qld 4072, AustraliaQueensland Inst Med Res, Brisbane, Qld 4006, Australia
Whitelaw, Nadia
Craig, Jeffrey M.
论文数: 0引用数: 0
h-index: 0
机构:
Murdoch Childrens Res Inst, Parkville, Vic 3052, AustraliaQueensland Inst Med Res, Brisbane, Qld 4006, Australia
Craig, Jeffrey M.
Apedaile, Anwyn
论文数: 0引用数: 0
h-index: 0
机构:
Univ London Imperial Coll Sci Technol & Med, MRC, Ctr Clin Sci, London SW7 2AZ, England
Univ London Imperial Coll Sci Technol & Med, MRC, Fac Med, London SW7 2AZ, EnglandQueensland Inst Med Res, Brisbane, Qld 4006, Australia
Apedaile, Anwyn
Hilton, Douglas J.
论文数: 0引用数: 0
h-index: 0
机构:
Walter & Eliza Hall Inst Med Res, Div Mol Med, Melbourne, Vic 3050, Australia
Univ Melbourne, Dept Med Biol, Melbourne, Vic 3050, AustraliaQueensland Inst Med Res, Brisbane, Qld 4006, Australia
Hilton, Douglas J.
Dunwoodie, Sally L.
论文数: 0引用数: 0
h-index: 0
机构:Queensland Inst Med Res, Brisbane, Qld 4006, Australia
Dunwoodie, Sally L.
Brockdorff, Neil
论文数: 0引用数: 0
h-index: 0
机构:
Univ London Imperial Coll Sci Technol & Med, MRC, Ctr Clin Sci, London SW7 2AZ, England
Univ London Imperial Coll Sci Technol & Med, MRC, Fac Med, London SW7 2AZ, EnglandQueensland Inst Med Res, Brisbane, Qld 4006, Australia
Brockdorff, Neil
Kay, Graham F.
论文数: 0引用数: 0
h-index: 0
机构:
Queensland Inst Med Res, Brisbane, Qld 4006, AustraliaQueensland Inst Med Res, Brisbane, Qld 4006, Australia
Kay, Graham F.
Whitelaw, Emma
论文数: 0引用数: 0
h-index: 0
机构:
Queensland Inst Med Res, Brisbane, Qld 4006, AustraliaQueensland Inst Med Res, Brisbane, Qld 4006, Australia