A robust model for read count data in exome sequencing experiments and implications for copy number variant calling

被引:541
作者
Plagnol, Vincent [1 ]
Curtis, James [2 ]
Epstein, Michael [1 ,3 ]
Mok, Kin Y. [4 ]
Stebbings, Emma [2 ]
Grigoriadou, Sofia [5 ]
Wood, Nicholas W. [4 ]
Hambleton, Sophie [6 ]
Burns, Siobhan O. [7 ]
Thrasher, Adrian J. [7 ]
Kumararatne, Dinakantha [8 ]
Doffinger, Rainer [8 ]
Nejentsev, Sergey [2 ]
机构
[1] UCL, UCL Genet Inst, London, England
[2] Univ Cambridge, Dept Med, Cambridge CB2 2QQ, England
[3] UCL, UCL CoMPLEX Program, London, England
[4] UCL, UCL Inst Neurol, London, England
[5] Royal London Hosp, London E1 1BB, England
[6] Newcastle Univ, Inst Cellular Med, Newcastle Upon Tyne NE1 7RU, Tyne & Wear, England
[7] Great Ormond St Hosp Sick Children, Mol Immunol Unit, Wolfson Ctr Gene Therapy Childhood Dis, UCL Inst Child Hlth, London WC1N 3JH, England
[8] Addenbrookes Hosp, Dept Clin Biochem & Immunol, Cambridge, England
基金
欧洲研究理事会; 英国惠康基金;
关键词
STRUCTURAL VARIANTS; PAIRED-END;
D O I
10.1093/bioinformatics/bts526
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Exome sequencing has proven to be an effective tool to discover the genetic basis of Mendelian disorders. It is well established that copy number variants (CNVs) contribute to the etiology of these disorders. However, calling CNVs from exome sequence data is challenging. A typical read depth strategy consists of using another sample (or a combination of samples) as a reference to control for the variability at the capture and sequencing steps. However, technical variability between samples complicates the analysis and can create spurious CNV calls. Results: Here, we introduce ExomeDepth, a new CNV calling algorithm designed to control for this technical variability. ExomeDepth uses a robust model for the read count data and uses this model to build an optimized reference set in order to maximize the power to detect CNVs. As a result, ExomeDepth is effective across a wider range of exome datasets than the previously existing tools, even for small (e.g. one to two exons) and heterozygous deletions. We used this new approach to analyse exome data from 24 patients with primary immunodeficiencies. Depending on data quality and the exact target region, we find between 170 and 250 exonic CNV calls per sample. Our analysis identified two novel causative deletions in the genes GATA2 and DOCK8.
引用
收藏
页码:2747 / 2754
页数:8
相关论文
共 16 条
[1]  
Agresti A., 2002, WILEY SERIES PROBABI, P553
[2]   Origins and functional impact of copy number variation in the human genome [J].
Conrad, Donald F. ;
Pinto, Dalila ;
Redon, Richard ;
Feuk, Lars ;
Gokcumen, Omer ;
Zhang, Yujun ;
Aerts, Jan ;
Andrews, T. Daniel ;
Barnes, Chris ;
Campbell, Peter ;
Fitzgerald, Tomas ;
Hu, Min ;
Ihm, Chun Hwa ;
Kristiansson, Kati ;
MacArthur, Daniel G. ;
MacDonald, Jeffrey R. ;
Onyiah, Ifejinelo ;
Pang, Andy Wing Chun ;
Robson, Sam ;
Stirrups, Kathy ;
Valsesia, Armand ;
Walter, Klaudia ;
Wei, John ;
Tyler-Smith, Chris ;
Carter, Nigel P. ;
Lee, Charles ;
Scherer, Stephen W. ;
Hurles, Matthew E. .
NATURE, 2010, 464 (7289) :704-712
[3]  
Karakoc E, 2012, NAT METHODS, V9, P176, DOI [10.1038/nmeth.1810, 10.1038/NMETH.1810]
[4]   Copy number variation detection and genotyping from exome sequence data [J].
Krumm, Niklas ;
Sudmant, Peter H. ;
Ko, Arthur ;
O'Roak, Brian J. ;
Malig, Maika ;
Coe, Bradley P. ;
Quinlan, Aaron R. ;
Nickerson, Deborah A. ;
Eichler, Evan E. .
GENOME RESEARCH, 2012, 22 (08) :1525-1532
[5]   Modeling Read Counts for CNV Detection in Exome Sequencing Data [J].
Love, Michael I. ;
Mysickova, Alena ;
Sun, Ruping ;
Kalscheuer, Vera ;
Vingron, Martin ;
Haas, Stefan A. .
STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2011, 10 (01)
[6]   Breaking the waves:: improved detection of copy number variation from microarray-based comparative genomic hybridization [J].
Marioni, John C. ;
Thorne, Natalie P. ;
Valsesia, Armand ;
Fitzgerald, Tomas ;
Redon, Richard ;
Fiegler, Heike ;
Andrews, T. Daniel ;
Stranger, Barbara E. ;
Lynch, Andrew G. ;
Dermitzakis, Emmanouil T. ;
Carter, Nigel P. ;
Tavare, Simon ;
Hurles, Matthew E. .
GENOME BIOLOGY, 2007, 8 (10)
[7]  
Medvedev P, 2009, NAT METHODS, V6, pS13, DOI [10.1038/NMETH.1374, 10.1038/nmeth.1374]
[8]   Mapping and quantifying mammalian transcriptomes by RNA-Seq [J].
Mortazavi, Ali ;
Williams, Brian A. ;
McCue, Kenneth ;
Schaeffer, Lorian ;
Wold, Barbara .
NATURE METHODS, 2008, 5 (07) :621-628
[9]   Exome sequencing identifies the cause of a mendelian disorder [J].
Ng, Sarah B. ;
Buckingham, Kati J. ;
Lee, Choli ;
Bigham, Abigail W. ;
Tabor, Holly K. ;
Dent, Karin M. ;
Huff, Chad D. ;
Shannon, Paul T. ;
Jabs, Ethylin Wang ;
Nickerson, Deborah A. ;
Shendure, Jay ;
Bamshad, Michael J. .
NATURE GENETICS, 2010, 42 (01) :30-U41
[10]   Mutations in GATA2 cause primary lymphedema associated with a predisposition to acute myeloid leukemia (Emberger syndrome) [J].
Ostergaard, Pia ;
Simpson, Michael A. ;
Connell, Fiona C. ;
Steward, Colin G. ;
Brice, Glen ;
Woollard, Wesley J. ;
Dafou, Dimitra ;
Kilo, Tatjana ;
Smithson, Sarah ;
Lunt, Peter ;
Murday, Victoria A. ;
Hodgson, Shirley ;
Keenan, Russell ;
Pilz, Daniela T. ;
Martinez-Corral, Ines ;
Makinen, Taija ;
Mortimer, Peter S. ;
Jeffery, Steve ;
Trembath, Richard C. ;
Mansour, Sahar .
NATURE GENETICS, 2011, 43 (10) :929-931