Targeted enrichment beyond the consensus coding DNA sequence exome reveals exons with higher variant densities

被引:173
作者
Bainbridge, Matthew N. [1 ,2 ]
Wang, Min [1 ]
Wu, Yuanqing [1 ]
Newsham, Irene [1 ]
Muzny, Donna M. [1 ]
Jefferies, John L. [3 ]
Albert, Thomas J. [4 ]
Burgess, Daniel L. [4 ]
Gibbs, Richard A. [1 ]
机构
[1] Baylor Coll Med, Human Genome Sequencing Ctr, Houston, TX 77030 USA
[2] Baylor Coll Med, Dept Struct & Computat Biol & Mol Biophys, Houston, TX 77030 USA
[3] Baylor Coll Med, Dept Pediat Cardiol, Houston, TX 77030 USA
[4] Roche NimbleGen Inc, Madison, WI 53719 USA
来源
GENOME BIOLOGY | 2011年 / 12卷 / 07期
基金
加拿大自然科学与工程研究理事会;
关键词
GENOME BROWSER DATABASE; SHORT-READ; MUTATION-RATES; CAPTURE; GENE; TRANSCRIPTION; SELECTION; VERTEBRATE; ALIGNMENT; OREGANNO;
D O I
10.1186/gb-2011-12-7-r68
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Enrichment of loci by DNA hybridization-capture, followed by high-throughput sequencing, is an important tool in modern genetics. Currently, the most common targets for enrichment are the protein coding exons represented by the consensus coding DNA sequence (CCDS). The CCDS, however, excludes many actual or computationally predicted coding exons present in other databases, such as RefSeq and Vega, and non-coding functional elements such as untranslated and regulatory regions. The number of variants per base pair (variant density) and our ability to interrogate regions outside of the CCDS regions is consequently less well understood. Results: We examine capture sequence data from outside of the CCDS regions and find that extremes of GC content that are present in different subregions of the genome can reduce the local capture sequence coverage to less than 50% relative to the CCDS. This effect is due to biases inherent in both the Illumina and SOLiD sequencing platforms that are exacerbated by the capture process. Interestingly, for two subregion types, microRNA and predicted exons, the capture process yields higher than expected coverage when compared to whole genome sequencing. Lastly, we examine the variation present in non-CCDS regions and find that predicted exons, as well as exonic regions specific to RefSeq and Vega, show much higher variant densities than the CCDS. Conclusions: We show that regions outside of the CCDS perform less efficiently in capture sequence experiments. Further, we show that the variant density in computationally predicted exons is more than 2.5-times higher than that observed in the CCDS.
引用
收藏
页数:12
相关论文
共 52 条
  • [31] Whole-Genome Sequencing in a Patient with Charcot-Marie-Tooth Neuropathy.
    Lupski, James R.
    Reid, Jeffrey G.
    Gonzaga-Jauregui, Claudia
    Deiros, David Rio
    Chen, David C. Y.
    Nazareth, Lynne
    Bainbridge, Matthew
    Dinh, Huyen
    Jing, Chyn
    Wheeler, David A.
    McGuire, Amy L.
    Zhang, Feng
    Stankiewicz, Pawel
    Halperin, John J.
    Yang, Chengyong
    Gehman, Curtis
    Guo, Danwei
    Irikat, Rola K.
    Tom, Warren
    Fantin, Nick J.
    Muzny, Donna M.
    Gibbs, Richard A.
    [J]. NEW ENGLAND JOURNAL OF MEDICINE, 2010, 362 (13) : 1181 - 1191
  • [32] Identification of unannotated exons of low abundance transcripts in Drosophila melanogaster and cloning of a new serine protease gene upregulated upon injury
    Maia, Rafaela M.
    Valente, Valeria
    Cunha, Marco A. V.
    Sousa, Josane F.
    Araujo, Daniela D.
    Silva, Wilson A., Jr.
    Zago, Marco A.
    Dias-Neto, Emmanuel
    Souza, Sandro J.
    Simpson, Andrew J. G.
    Monesi, Nadia
    Ramos, Ricardo G. P.
    Espreafico, Enilza M.
    Paco-Larson, Maria L.
    [J]. BMC GENOMICS, 2007, 8
  • [33] ALU SEQUENCES IN THE CODING REGIONS OF MESSENGER-RNA - SOURCE OF PROTEIN VARIABILITY
    MAKALOWSKI, W
    MITCHELL, GA
    LABUDA, D
    [J]. TRENDS IN GENETICS, 1994, 10 (06) : 188 - 193
  • [34] Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding
    McKernan, Kevin Judd
    Peckham, Heather E.
    Costa, Gina L.
    McLaughlin, Stephen F.
    Fu, Yutao
    Tsung, Eric F.
    Clouser, Christopher R.
    Duncan, Cisyla
    Ichikawa, Jeffrey K.
    Lee, Clarence C.
    Zhang, Zheng
    Ranade, Swati S.
    Dimalanta, Eileen T.
    Hyland, Fiona C.
    Sokolsky, Tanya D.
    Zhang, Lei
    Sheridan, Andrew
    Fu, Haoning
    Hendrickson, Cynthia L.
    Li, Bin
    Kotler, Lev
    Stuart, Jeremy R.
    Malek, Joel A.
    Manning, Jonathan M.
    Antipova, Alena A.
    Perez, Damon S.
    Moore, Michael P.
    Hayashibara, Kathleen C.
    Lyons, Michael R.
    Beaudoin, Robert E.
    Coleman, Brittany E.
    Laptewicz, Michael W.
    Sannicandro, Adam E.
    Rhodes, Michael D.
    Gottimukkala, Rajesh K.
    Yang, Shan
    Bafna, Vineet
    Bashir, Ali
    MacBride, Andrew
    Alkan, Can
    Kidd, Jeffrey M.
    Eichler, Evan E.
    Reese, Martin G.
    De la Vega, Francisco M.
    Blanchard, Alan P.
    [J]. GENOME RESEARCH, 2009, 19 (09) : 1527 - 1541
  • [35] ORegAnno: an open access database and curation system for literature-derived promoters, transcription factor binding sites and regulatory variation
    Montgomery, SB
    Griffith, OL
    Sleumer, MC
    Bergman, CM
    Bilenky, M
    Pleasance, ED
    Prychyna, Y
    Zhang, X
    Jones, SJM
    [J]. BIOINFORMATICS, 2006, 22 (05) : 637 - 640
  • [36] Exome sequencing identifies the cause of a mendelian disorder
    Ng, Sarah B.
    Buckingham, Kati J.
    Lee, Choli
    Bigham, Abigail W.
    Tabor, Holly K.
    Dent, Karin M.
    Huff, Chad D.
    Shannon, Paul T.
    Jabs, Ethylin Wang
    Nickerson, Deborah A.
    Shendure, Jay
    Bamshad, Michael J.
    [J]. NATURE GENETICS, 2010, 42 (01) : 30 - U41
  • [37] Targeted capture and massively parallel sequencing of 12 human exomes
    Ng, Sarah B.
    Turner, Emily H.
    Robertson, Peggy D.
    Flygare, Steven D.
    Bigham, Abigail W.
    Lee, Choli
    Shaffer, Tristan
    Wong, Michelle
    Bhattacharjee, Arindam
    Eichler, Evan E.
    Bamshad, Michael
    Nickerson, Deborah A.
    Shendure, Jay
    [J]. NATURE, 2009, 461 (7261) : 272 - U153
  • [38] Microarray-based genomic selection for high-throughput resequencing
    Okou, David T.
    Steinberg, Karyn Meltz
    Middle, Christina
    Cutler, David J.
    Albert, Thomas J.
    Zwick, Michael E.
    [J]. NATURE METHODS, 2007, 4 (11) : 907 - 909
  • [39] Candidate exome capture identifies mutation of SDCCAG8 as the cause of a retinal-renal ciliopathy
    Otto, Edgar A.
    Hurd, Toby W.
    Airik, Rannar
    Chaki, Moumita
    Zhou, Weibin
    Stoetzel, Corinne
    Patil, Suresh B.
    Levy, Shawn
    Ghosh, Amiya K.
    Murga-Zamalloa, Carlos A.
    van Reeuwijk, Jeroen
    Letteboer, Stef J. F.
    Sang, Liyun
    Giles, Rachel H.
    Liu, Qin
    Coene, Karlien L. M.
    Estrada-Cuzcano, Alejandro
    Collin, Rob W. J.
    McLaughlin, Heather M.
    Held, Susanne
    Kasanuki, Jennifer M.
    Ramaswami, Gokul
    Conte, Jinny
    Lopez, Irma
    Washburn, Joseph
    MacDonald, James
    Hu, Jinghua
    Yamashita, Yukiko
    Maher, Eamonn R.
    Guay-Woodford, Lisa M.
    Neumann, Hartmut P. H.
    Obermueller, Nicholas
    Koenekoop, Robert K.
    Bergmann, Carsten
    Bei, Xiaoshu
    Lewis, Richard A.
    Katsanis, Nicholas
    Lopes, Vanda
    Williams, David S.
    Lyons, Robert H.
    Dang, Chi V.
    Brito, Daniela A.
    Dias, Monica Bettencourt
    Zhang, Xinmin
    Cavalcoli, James D.
    Nuernberg, Gudrun
    Nuernberg, Peter
    Pierce, Eric A.
    Jackson, Peter K.
    Antignac, Corinne
    [J]. NATURE GENETICS, 2010, 42 (10) : 840 - +
  • [40] Multiplex amplification of large sets of human exons
    Porreca, Gregory J.
    Zhang, Kun
    Li, Jin Billy
    Xie, Bin
    Austin, Derek
    Vassallo, Sara L.
    LeProust, Emily M.
    Peck, Bill J.
    Emig, Christopher J.
    Dahl, Fredrik
    Gao, Yuan
    Church, George M.
    Shendure, Jay
    [J]. NATURE METHODS, 2007, 4 (11) : 931 - 936