Estimating Haplotype Frequencies by Combining Data from Large DNA Pools with Database Information

被引:12
作者
Gasbarra, Dario [1 ]
Kulathinal, Sangita [2 ]
Pirinen, Matti [3 ]
Sillanpaa, Mikko J. [1 ,4 ]
机构
[1] Univ Helsinki, Dept Math & Stat, FIN-00014 Helsinki, Finland
[2] Indic Soc Educ & Dev, Nasik, India
[3] Univ Oxford, Wellcome Trust Ctr Human Genet, Oxford OX3 7BN, England
[4] Univ Helsinki, Dept Anim Sci, FIN-00014 Helsinki, Finland
基金
芬兰科学院;
关键词
DNA pools; haplotype frequency estimation; HapMap database; multinomial distribution; POLYMERASE-CHAIN-REACTION; SPERM-TYPING DATA; LINKAGE-DISEQUILIBRIUM; HUMAN GENOME; HAPLOID DNA; PEDIGREES; ALGORITHM; SAMPLES; ASSOCIATION; POPULATION;
D O I
10.1109/TCBB.2009.71
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We assume that allele frequency data have been extracted from several large DNA pools, each containing genetic material of up to hundreds of sampled individuals. Our goal is to estimate the haplotype frequencies among the sampled individuals by combining the pooled allele frequency data with prior knowledge about the set of possible haplotypes. Such prior information can be obtained, for example, from a database such as HapMap. We present a Bayesian haplotyping method for pooled DNA based on a continuous approximation of the multinomial distribution. The proposed method is applicable when the sizes of the DNA pools and/or the number of considered loci exceed the limits of several earlier methods. In the example analyses, the proposed model clearly outperforms a deterministic greedy algorithm on real data from the HapMap database. With a small number of loci, the performance of the proposed method is similar to that of an EM-algorithm, which uses a multinormal approximation for the pooled allele frequencies, but which does not utilize prior information about the haplotypes. The method has been implemented using Matlab and the code is available upon request from the authors.
引用
收藏
页码:36 / 44
页数:9
相关论文
共 54 条
[1]  
[Anonymous], 1969, DISTRIBUTIONS STAT D
[2]   Hot and cold spots of recombination in the human genome: The reason we should find them and how this can be achieved [J].
Arnheim, N ;
Calabrese, P ;
Nordborg, M .
AMERICAN JOURNAL OF HUMAN GENETICS, 2003, 73 (01) :5-16
[3]  
BOEHNKE M, 1991, AM J HUM GENET, V49, P1174
[4]   Genotyping pooled DNA on microarrays: A systematic genome screen of thousands of SNPs in large samples to detect QTLs for complex traits [J].
Butcher, LM ;
Meaburn, E ;
Liu, L ;
Fernandes, C ;
Hill, L ;
Al-Chalabi, A ;
Plomin, R ;
Schalkwyk, L ;
Craig, IW .
BEHAVIOR GENETICS, 2004, 34 (05) :549-555
[5]   UNDERSTANDING THE METROPOLIS-HASTINGS ALGORITHM [J].
CHIB, S ;
GREENBERG, E .
AMERICAN STATISTICIAN, 1995, 49 (04) :327-335
[6]  
CLARK AG, 1990, MOL BIOL EVOL, V7, P111
[7]   Identification and separation of DNA mixtures using peak area information [J].
Cowell, R. G. ;
Lauritzen, S. L. ;
Mortera, J. .
FORENSIC SCIENCE INTERNATIONAL, 2007, 166 (01) :28-34
[8]   High-resolution patterns of meiotic recombination across the human major histocompatibility complex [J].
Cullen, M ;
Perfetto, SP ;
Klitz, W ;
Nelson, G ;
Carrington, M .
AMERICAN JOURNAL OF HUMAN GENETICS, 2002, 71 (04) :759-776
[9]   Experimentally-derived haplotypes substantially increase the efficiency of linkage disequilibrium studies [J].
Douglas, JA ;
Boehnke, M ;
Gillanders, E ;
Trent, JA ;
Gruber, SB .
NATURE GENETICS, 2001, 28 (04) :361-364
[10]  
EXCOFFIER L, 1995, MOL BIOL EVOL, V12, P921