Quantifying the amount of missing information in genetic association studies

被引:18
作者
Nicolae, Dan L.
机构
[1] Univ Chicago, Dept Med, Chicago, IL 60637 USA
[2] Univ Chicago, Dept Stat, Chicago, IL 60637 USA
关键词
information content; multi-locus linkage disequilibrium; asymptotic relative efficiency; association testing; case-control design;
D O I
10.1002/gepi.20181
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Many genetic analyses are done with incomplete information; for example, unknown phase in haplotype-based association studies. Measures of the amount of available information can be used for efficient planning of studies and/or analyses. In particular, the linkage disequilibrium (LD) between two sets of markers can be interpreted as the amount of information one set of markers contains for testing allele frequency differences in the second set, and measuring LD can be viewed as quantifying information in a missing data problem. We introduce a framework for measuring the association between two sets of variables; for example, genotype data for two distinct groups of markers, or haplotype and genotype data for a given set of polymorphisms. The goal is to quantify how much information is in one data set, e.g. genotype data for a set of SNPs, for estimating parameters that are functions of frequencies in the second data set, e.g. haplotype frequencies, relative to the ideal case of actually observing the complete data, e.g. haplotypes. In the case of genotype data on two mutually exclusive sets of markers, the measure determines the amount of multi-locus LD, and is equal to the classical measure r(2), if the sets consist each of one bi-allelic marker. In general, the measures are interpreted as the asymptotic ratio of sample sizes necessary to achieve the same power in case-control testing. The focus of this paper is on case-control allele/haplotype tests, but the framework can be extended easily to other settings like regressing quantitative traits on allele/haplotype counts, or tests on genotypes or diplotypes. We highlight applications of the approach, including tools for navigating the HapMap database [The International HapMap Consortium, 2003], and genotyping strategies for positional cloning studies. Genet. Epidemiol. 30:703-717, 2006. (c) 2006 Wiley-Liss, Inc.
引用
收藏
页码:703 / 717
页数:15
相关论文
共 42 条
[1]   A single-nucleotide polymorphism tagging set for human drug metabolism and transport [J].
Ahmadi, KR ;
Weale, ME ;
Xue, ZYY ;
Soranzo, N ;
Yarnall, DP ;
Briley, JD ;
Maruyama, Y ;
Kobayashi, M ;
Wood, NW ;
Spurr, NK ;
Burns, DK ;
Roses, AD ;
Saunders, AM ;
Goldstein, DB .
NATURE GENETICS, 2005, 37 (01) :84-89
[2]   Estimating multilocus linkage disequilibria [J].
Barton, NH .
HEREDITY, 2000, 84 (03) :373-389
[3]  
BENNETT JH, 1954, ANN EUGENIC, V18, P311
[4]  
Brown L. D., 1986, LECT NOTES MONOGRAPH
[5]   Detecting disease associations due to linkage disequilibrium using haplotype tags: A class of tests and the determinants of statistical power [J].
Chapman, JM ;
Cooper, JD ;
Todd, JA ;
Clayton, DG .
HUMAN HEREDITY, 2003, 56 (1-3) :18-31
[6]  
COX D. R., 2000, Theoretical Statistics
[7]   High-resolution haplotype structure in the human genome [J].
Daly, MJ ;
Rioux, JD ;
Schaffner, SE ;
Hudson, TJ ;
Lander, ES .
NATURE GENETICS, 2001, 29 (02) :229-232
[8]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[9]   A COMPARISON OF LINKAGE DISEQUILIBRIUM MEASURES FOR FINE-SCALE MAPPING [J].
DEVLIN, B ;
RISCH, N .
GENOMICS, 1995, 29 (02) :311-322
[10]   Genotype prediction using a dense map of SNPs [J].
Evans, DM ;
Cardon, LR ;
Morris, AP .
GENETIC EPIDEMIOLOGY, 2004, 27 (04) :375-384