Screening the human exome: a comparison of whole genome and whole transcriptome sequencing

被引:101
作者
Cirulli, Elizabeth T. [1 ]
Singh, Abanish [1 ]
Shianna, Kevin V. [1 ]
Ge, Dongliang [1 ]
Smith, Jason P. [1 ]
Maia, Jessica M. [1 ]
Heinzen, Erin L. [1 ]
Goedert, James J. [2 ]
Goldstein, David B. [1 ]
机构
[1] Duke Univ, Sch Med, Ctr Human Genome Variat, Durham, NC 27708 USA
[2] US Natl Canc Inst Hlth, Infect & Immunoepidemiol Branch, Div Canc Epidemiol & Genet, Rockville, MD 20852 USA
来源
GENOME BIOLOGY | 2010年 / 11卷 / 05期
关键词
Splice Junction; Read Depth; Exonic Variant; Exome Capture; False Positive Call;
D O I
10.1186/gb-2010-11-5-r57
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: There is considerable interest in the development of methods to efficiently identify all coding variants present in large sample sets of humans. There are three approaches possible: whole-genome sequencing, whole-exome sequencing using exon capture methods, and RNA-Seq. While whole-genome sequencing is the most complete, it remains sufficiently expensive that cost effective alternatives are important. Results: Here we provide a systematic exploration of how well RNA-Seq can identify human coding variants by comparing variants identified through high coverage whole-genome sequencing to those identified by high coverage RNA-Seq in the same individual. This comparison allowed us to directly evaluate the sensitivity and specificity of RNA-Seq in identifying coding variants, and to evaluate how key parameters such as the degree of coverage and the expression levels of genes interact to influence performance. We find that although only 40% of exonic variants identified by whole genome sequencing were captured using RNA-Seq; this number rose to 81% when concentrating on genes known to be well-expressed in the source tissue. We also find that a high false positive rate can be problematic when working with RNA-Seq data, especially at higher levels of coverage. Conclusions: We conclude that as long as a tissue relevant to the trait under study is available and suitable quality control screens are implemented, RNA-Seq is a fast and inexpensive alternative approach for finding coding variants in genes with sufficiently high expression levels.
引用
收藏
页数:8
相关论文
共 10 条
  • [1] Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease
    Botstein, D
    Risch, N
    [J]. NATURE GENETICS, 2003, 33 (Suppl 3) : 228 - 237
  • [2] Detection of single nucleotide variations in expressed exons of the human genome using RNA-Seq
    Chepelev, Iouri
    Wei, Gang
    Tang, Qingsong
    Zhao, Keji
    [J]. NUCLEIC ACIDS RESEARCH, 2009, 37 (16) : e106 - e106
  • [3] Discovery of tissue-specific exons using comprehensive human exon microarrays
    Clark, Tyson A.
    Schweitzer, Anthony C.
    Chen, Tina X.
    Staples, Michelle K.
    Lu, Gang
    Wang, Hui
    Williams, Alan
    Blume, John E.
    [J]. GENOME BIOLOGY, 2007, 8 (04)
  • [4] Tissue-Specific Genetic Control of Splicing: Implications for the Study of Complex Traits
    Heinzen, Erin L.
    Ge, Dongliang
    Cronin, Kenneth D.
    Maia, Jessica M.
    Shianna, Kevin V.
    Gabriel, Willow N.
    Welsh-Bohmer, Kathleen A.
    Hulette, Christine M.
    Denny, Thomas N.
    Goldstein, David B.
    [J]. PLOS BIOLOGY, 2008, 6 (12) : 2869 - 2879
  • [5] The Sequence Alignment/Map format and SAMtools
    Li, Heng
    Handsaker, Bob
    Wysoker, Alec
    Fennell, Tim
    Ruan, Jue
    Homer, Nils
    Marth, Gabor
    Abecasis, Goncalo
    Durbin, Richard
    [J]. BIOINFORMATICS, 2009, 25 (16) : 2078 - 2079
  • [6] Fast and accurate short read alignment with Burrows-Wheeler transform
    Li, Heng
    Durbin, Richard
    [J]. BIOINFORMATICS, 2009, 25 (14) : 1754 - 1760
  • [7] Targeted capture and massively parallel sequencing of 12 human exomes
    Ng, Sarah B.
    Turner, Emily H.
    Robertson, Peggy D.
    Flygare, Steven D.
    Bigham, Abigail W.
    Lee, Choli
    Shaffer, Tristan
    Wong, Michelle
    Bhattacharjee, Arindam
    Eichler, Evan E.
    Bamshad, Michael
    Nickerson, Deborah A.
    Shendure, Jay
    [J]. NATURE, 2009, 461 (7261) : 272 - U153
  • [8] GeneCards: a novel functional genomics compendium with automated data mining and query reformulation support
    Rebhan, M
    Chalifa-Caspi, V
    Prilusky, J
    Lancet, D
    [J]. BIOINFORMATICS, 1998, 14 (08) : 656 - 664
  • [9] Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution
    Shah, Sohrab P.
    Morin, Ryan D.
    Khattra, Jaswinder
    Prentice, Leah
    Pugh, Trevor
    Burleigh, Angela
    Delaney, Allen
    Gelmon, Karen
    Guliany, Ryan
    Senz, Janine
    Steidl, Christian
    Holt, Robert A.
    Jones, Steven
    Sun, Mark
    Leung, Gillian
    Moore, Richard
    Severson, Tesa
    Taylor, Greg A.
    Teschendorff, Andrew E.
    Tse, Kane
    Turashvili, Gulisa
    Varhol, Richard
    Warren, Rene L.
    Watson, Peter
    Zhao, Yongjun
    Caldas, Carlos
    Huntsman, David
    Hirst, Martin
    Marra, Marco A.
    Aparicio, Samuel
    [J]. NATURE, 2009, 461 (7265) : 809 - U67
  • [10] TopHat: discovering splice junctions with RNA-Seq
    Trapnell, Cole
    Pachter, Lior
    Salzberg, Steven L.
    [J]. BIOINFORMATICS, 2009, 25 (09) : 1105 - 1111