Detection Theory in Identification of RNA-DNA Sequence Differences Using RNA-Sequencing

被引:6
作者
Toung, Jonathan M. [1 ]
Lahens, Nicholas [1 ]
Hogenesch, John B. [2 ,3 ,5 ]
Grant, Gregory [2 ,3 ,4 ]
机构
[1] Univ Penn, Sch Med, Genom & Computat Biol Grad Program, Philadelphia, PA 19104 USA
[2] Univ Penn, Sch Med, Inst Biomed Informat, Philadelphia, PA 19104 USA
[3] Univ Penn, Sch Med, Inst Translat Med & Therapeut, Philadelphia, PA 19104 USA
[4] Univ Penn, Sch Med, Dept Genet, Philadelphia, PA 19104 USA
[5] Univ Penn, Sch Med, Dept Pharmacol, Philadelphia, PA 19104 USA
来源
PLOS ONE | 2014年 / 9卷 / 11期
基金
美国国家卫生研究院;
关键词
EDITING SITES; ALLELIC EXPRESSION; PARALLEL DNA; TRANSCRIPTOME; ALIGNMENT; ORIGIN;
D O I
10.1371/journal.pone.0112040
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Advances in sequencing technology have allowed for detailed analyses of the transcriptome at single-nucleotide resolution, facilitating the study of RNA editing or sequence differences between RNA and DNA genome-wide. In humans, two types of post-transcriptional RNA editing processes are known to occur: A-to-I deamination by ADAR and C-to-U deamination by APOBEC1. In addition to these sequence differences, researchers have reported the existence of all 12 types of RNA-DNA sequence differences (RDDs); however, the validity of these claims is debated, as many studies claim that technical artifacts account for the majority of these non-canonical sequence differences. In this study, we used a detection theory approach to evaluate the performance of RNA-Sequencing (RNA-Seq) and associated aligners in accurately identifying RNA-DNA sequence differences. By generating simulated RNA-Seq datasets containing RDDs, we assessed the effect of alignment artifacts and sequencing error on the sensitivity and false discovery rate of RDD detection. Overall, we found that even in the presence of sequencing errors, false negative and false discovery rates of RDD detection can be contained below 10% with relatively lenient thresholds. We also assessed the ability of various filters to target false positive RDDs and found them to be effective in discriminating between true and false positives. Lastly, we used the optimal thresholds we identified from our simulated analyses to identify RDDs in a human lymphoblastoid cell line. We found approximately 6,000 RDDs, the majority of which are A-to-G edits and likely to be mediated by ADAR. Moreover, we found the majority of non A-to-G RDDs to be associated with poorer alignments and conclude from these results that the evidence for widespread non-canonical RDDs in humans is weak. Overall, we found RNA-Seq to be a powerful technique for surveying RDDs genome-wide when coupled with the appropriate thresholds and filters.
引用
收藏
页数:12
相关论文
共 43 条
  • [1] Widespread A-to-I RNA editing of alu-containing mRNAs in the human transcriptome
    Athanasiadis, A
    Rich, A
    Maas, S
    [J]. PLOS BIOLOGY, 2004, 2 (12): : 2144 - 2158
  • [2] Accurate identification of A-to-I RNA editing in human by transcriptome sequencing
    Bahn, Jae Hoon
    Lee, Jae-Hyung
    Li, Gang
    Greer, Christopher
    Peng, Guangdun
    Xiao, Xinshu
    [J]. GENOME RESEARCH, 2012, 22 (01) : 142 - 150
  • [3] AN UNWINDING ACTIVITY THAT COVALENTLY MODIFIES ITS DOUBLE-STRANDED-RNA SUBSTRATE
    BASS, BL
    WEINTRAUB, H
    [J]. CELL, 1988, 55 (06) : 1089 - 1098
  • [4] APOLIPOPROTEIN B-48 IS THE PRODUCT OF A MESSENGER-RNA WITH AN ORGAN-SPECIFIC IN-FRAME STOP CODON
    CHEN, SH
    HABIB, G
    YANG, CY
    GU, ZW
    LEE, BR
    WENG, SA
    SILBERMAN, SR
    CAI, SJ
    DESLYPERE, JP
    ROSSENEU, M
    GOTTO, AM
    LI, WH
    CHAN, L
    [J]. SCIENCE, 1987, 238 (4825) : 363 - 366
  • [5] Critical Evaluation of Imprinted Gene Expression by RNA-Seq: A New Perspective
    DeVeale, Brian
    van der Kooy, Derek
    Babak, Tomas
    [J]. PLOS GENETICS, 2012, 8 (03):
  • [6] Substantial biases in ultra-short read data sets from high-throughput DNA sequencing
    Dohm, Juliane C.
    Lottaz, Claudio
    Borodina, Tatiana
    Himmelbauer, Heinz
    [J]. NUCLEIC ACIDS RESEARCH, 2008, 36 (16)
  • [7] An integrated encyclopedia of DNA elements in the human genome
    Dunham, Ian
    Kundaje, Anshul
    Aldred, Shelley F.
    Collins, Patrick J.
    Davis, CarrieA.
    Doyle, Francis
    Epstein, Charles B.
    Frietze, Seth
    Harrow, Jennifer
    Kaul, Rajinder
    Khatun, Jainab
    Lajoie, Bryan R.
    Landt, Stephen G.
    Lee, Bum-Kyu
    Pauli, Florencia
    Rosenbloom, Kate R.
    Sabo, Peter
    Safi, Alexias
    Sanyal, Amartya
    Shoresh, Noam
    Simon, Jeremy M.
    Song, Lingyun
    Trinklein, Nathan D.
    Altshuler, Robert C.
    Birney, Ewan
    Brown, James B.
    Cheng, Chao
    Djebali, Sarah
    Dong, Xianjun
    Dunham, Ian
    Ernst, Jason
    Furey, Terrence S.
    Gerstein, Mark
    Giardine, Belinda
    Greven, Melissa
    Hardison, Ross C.
    Harris, Robert S.
    Herrero, Javier
    Hoffman, Michael M.
    Iyer, Sowmya
    Kellis, Manolis
    Khatun, Jainab
    Kheradpour, Pouya
    Kundaje, Anshul
    Lassmann, Timo
    Li, Qunhua
    Lin, Xinying
    Marinov, Georgi K.
    Merkel, Angelika
    Mortazavi, Ali
    [J]. NATURE, 2012, 489 (7414) : 57 - 74
  • [8] Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM)
    Grant, Gregory R.
    Farkas, Michael H.
    Pizarro, Angel D.
    Lahens, Nicholas F.
    Schug, Jonathan
    Brunk, Brian P.
    Stoeckert, Christian J.
    Hogenesch, John B.
    Pierce, Eric A.
    [J]. BIOINFORMATICS, 2011, 27 (18) : 2518 - 2528
  • [9] High-Resolution Analysis of Parent-of-Origin Allelic Expression in the Mouse Brain
    Gregg, Christopher
    Zhang, Jiangwen
    Weissbourd, Brandon
    Luo, Shujun
    Schroth, Gary P.
    Haig, David
    Dulac, Catherine
    [J]. SCIENCE, 2010, 329 (5992) : 643 - 648
  • [10] GENCODE: producing a reference annotation for ENCODE
    Harrow, Jennifer
    Denoeud, France
    Frankish, Adam
    Reymond, Alexandre
    Chen, Chao-Kung
    Chrast, Jacqueline
    Lagarde, Julien
    Gilbert, James Gr
    Storey, Roy
    Swarbreck, David
    Rossier, Colette
    Ucla, Catherine
    Hubbard, Tim
    Antonarakis, Stylianos E.
    Guigo, Roderic
    [J]. GENOME BIOLOGY, 2006, 7 (Suppl 1)