Inferring Haplotypes of Copy Number Variations From High-Throughput Data With Uncertainty

被引:3
作者
Kato, Mamoru [1 ]
Yoon, Seungtai [1 ]
Hosono, Naoya [2 ]
Leotta, Anthony [1 ]
Sebat, Jonathan [3 ]
Tsunoda, Tatsuhiko [2 ]
Zhang, Michael Q. [1 ,4 ]
机构
[1] Cold Spring Harbor Lab, Cold Spring Harbor, NY 11724 USA
[2] RIKEN, Ctr Genom Med, Yokohama, Kanagawa 2300045, Japan
[3] Univ Calif San Diego, Dept Psychiat, La Jolla, CA 92093 USA
[4] Tsinghua Univ, Dept Automat, TNLIST, Bioinformat Div, Beijing 100084, Peoples R China
基金
美国国家卫生研究院;
关键词
copy number; variation; EM algorithm; haplotype inference; phasing; POPULATION; INFERENCE; ASSOCIATION; VARIANTS; ALLELE; POLYMORPHISM; FREQUENCIES; ALGORITHM; SNPS;
D O I
10.1534/g3.111.000174
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Accurate information on haplotypes and diplotypes (haplotype pairs) is required for population-genetic analyses; however, microarrays do not provide data on a haplotype or diplotype at a copy number variation (CNV) locus; they only provide data on the total number of copies over a diplotype or an unphased sequence genotype (e. g., AAB, unlike AB of single nucleotide polymorphism). Moreover, such copy numbers or genotypes are often incorrectly determined when microarray signal intensities derived from different copy numbers or genotypes are not clearly separated due to noise. Here we report an algorithm to infer CNV haplotypes and individuals' diplotypes at multiple loci from noisy microarray data, utilizing the probability that a signal intensity may be derived from different underlying copy numbers or genotypes. Performing simulation studies based on known diplotypes and an error model obtained from real microarray data, we demonstrate that this probabilistic approach succeeds in accurate inference (error rate: 1-2%) from noisy data, whereas previous deterministic approaches failed (error rate: 12-18%). Applying this algorithm to real microarray data, we estimated haplotype frequencies and diplotypes in 1486 CNV regions for 100 individuals. Our algorithm will facilitate accurate population-genetic analyses and powerful disease association studies of CNVs.
引用
收藏
页码:35 / 42
页数:8
相关论文
共 39 条
[1]   Personalized copy number and segmental duplication maps using next-generation sequencing [J].
Alkan, Can ;
Kidd, Jeffrey M. ;
Marques-Bonet, Tomas ;
Aksay, Gozde ;
Antonacci, Francesca ;
Hormozdiari, Fereydoun ;
Kitzman, Jacob O. ;
Baker, Carl ;
Malig, Maika ;
Mutlu, Onur ;
Sahinalp, S. Cenk ;
Gibbs, Richard A. ;
Eichler, Evan E. .
NATURE GENETICS, 2009, 41 (10) :1061-U29
[2]   Simultaneous Genotype Calling and Haplotype Phasing Improves Genotype Accuracy and Reduces False-Positive Associations for Genome-wide Association Studies [J].
Browning, Brian L. ;
Yu, Zhaoxia .
AMERICAN JOURNAL OF HUMAN GENETICS, 2009, 85 (06) :847-861
[3]  
Clark AG, 2004, LECT N BIOINFORMAT, V2983, P1
[4]  
Coin LJM, 2010, NAT METHODS, V7, P541, DOI [10.1038/NMETH.1466, 10.1038/nmeth.1466]
[5]   The population genetics of structural variation [J].
Conrad, Donald F. ;
Hurles, Matthew E. .
NATURE GENETICS, 2007, 39 (Suppl 7) :S30-S36
[6]   Origins and functional impact of copy number variation in the human genome [J].
Conrad, Donald F. ;
Pinto, Dalila ;
Redon, Richard ;
Feuk, Lars ;
Gokcumen, Omer ;
Zhang, Yujun ;
Aerts, Jan ;
Andrews, T. Daniel ;
Barnes, Chris ;
Campbell, Peter ;
Fitzgerald, Tomas ;
Hu, Min ;
Ihm, Chun Hwa ;
Kristiansson, Kati ;
MacArthur, Daniel G. ;
MacDonald, Jeffrey R. ;
Onyiah, Ifejinelo ;
Pang, Andy Wing Chun ;
Robson, Sam ;
Stirrups, Kathy ;
Valsesia, Armand ;
Walter, Klaudia ;
Wei, John ;
Tyler-Smith, Chris ;
Carter, Nigel P. ;
Lee, Charles ;
Scherer, Stephen W. ;
Hurles, Matthew E. .
NATURE, 2010, 464 (7289) :704-712
[7]   Array CGH analysis of copy number variation identifies 1284 new genes variant in healthy white males: implications for association studies of complex diseases [J].
de Smith, Adam J. ;
Tsalenko, Anya ;
Sampas, Nick ;
Scheffer, Alicia ;
Yamada, N. Alice ;
Tsang, Peter ;
Ben-Dor, Amir ;
Yakhini, Zohar ;
Ellis, Richard J. ;
Bruhn, Laurakay ;
Laderman, Stephen ;
Froguel, Philippe ;
Blakemore, Alexandra I. F. .
HUMAN MOLECULAR GENETICS, 2007, 16 (23) :2783-2794
[8]  
EXCOFFIER L, 1995, MOL BIOL EVOL, V12, P921
[9]   Copy number variation: New insights in genome diversity [J].
Freeman, Jennifer L. ;
Perry, George H. ;
Feuk, Lars ;
Redon, Richard ;
McCarroll, Steven A. ;
Altshuler, David M. ;
Aburatani, Hiroyuki ;
Jones, Keith W. ;
Tyler-Smith, Chris ;
Hurles, Matthew E. ;
Carter, Nigel P. ;
Scherer, Stephen W. ;
Lee, Charles .
GENOME RESEARCH, 2006, 16 (08) :949-961
[10]   Multiplex PCR-based real-time invader assay (mPCR-RETINA): A novel SNP-based method for detecting allellic asymmetries within copy number variation regions [J].
Hosono, Naoya ;
Kubo, Michiaki ;
Tsuchiya, Yumiko ;
Sato, Hiroko ;
Kitamoto, Takuya ;
Saito, Susumu ;
Ohnishi, Yozo ;
Nakamura, Yusuke .
HUMAN MUTATION, 2008, 29 (01) :182-189