ChimPipe: accurate detection of fusion genes and transcription-induced chimeras from RNA-seq data

被引:23
作者
Rodriguez-Martin, Bernardo [1 ,2 ,3 ]
Palumbo, Emilio [1 ,2 ]
Marco-Sola, Santiago [4 ]
Griebel, Thasso [4 ]
Ribeca, Paolo [4 ,5 ]
Alonso, Graciela [6 ]
Rastrojo, Alberto [6 ]
Aguado, Begona [6 ]
Guigo, Roderic [1 ,2 ,7 ]
Djebali, Sarah [1 ,2 ,8 ]
机构
[1] Barcelona Inst Sci & Technol, Ctr Genom Regulat CRG, Dr Aiguader 88, Barcelona 08003, Spain
[2] UPF, Barcelona, Spain
[3] Barcelona Supercomp Ctr, Joint IRB BSC Program Computat Biol, Jordi Girona 31, Barcelona 08034, Spain
[4] Ctr Nacl Anal Genom, Baldiri Reixac 4,Barcelona Sci Pk Tower 1, Barcelona 08028, Spain
[5] Pirbright Inst, Integrat Biol, Ash Rd, London GU24 0NF, England
[6] CSIC UAM, Ctr Biol Mol Severo Ochoa, Nicolas Cabrera 1, Madrid 28049, Spain
[7] Inst Hosp Mar Invest Med IMIM, Barcelona 08003, Spain
[8] Univ Toulouse, GenPhySE, INRA, ENVT,INPT, Castanet Tolosan, France
来源
BMC GENOMICS | 2017年 / 18卷
基金
美国国家卫生研究院; 英国生物技术与生命科学研究理事会;
关键词
Chimera; Transcript; Fusion gene; RNA-seq; Benchmark; Cancer; Simulation; Isoform; Splice junction; PROSTATE-CANCER; SEQUENCING DATA; HUMAN GENOME; HUMAN-CELLS; DISCOVERY; IDENTIFICATION; MECHANISMS; ALGORITHM; ALIGNMENT; LEUKEMIA;
D O I
10.1186/s12864-016-3404-9
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Chimeric transcripts are commonly defined as transcripts linking two or more different genes in the genome, and can be explained by various biological mechanisms such as genomic rearrangement, read-through or trans-splicing, but also by technical or biological artefacts. Several studies have shown their importance in cancer, cell pluripotency and motility. Many programs have recently been developed to identify chimeras from Illumina RNA-seq data (mostly fusion genes in cancer). However outputs of different programs on the same dataset can be widely inconsistent, and tend to include many false positives. Other issues relate to simulated datasets restricted to fusion genes, real datasets with limited numbers of validated cases, result inconsistencies between simulated and real datasets, and gene rather than junction level assessment. Results: Here we present ChimPipe, a modular and easy-to-use method to reliably identify fusion genes and transcription-induced chimeras from paired-end Illumina RNA-seq data. We have also produced realistic simulated datasets for three different read lengths, and enhanced two gold-standard cancer datasets by associating exact junction points to validated gene fusions. Benchmarking ChimPipe together with four other state-of-the-art tools on this data showed ChimPipe to be the top program at identifying exact junction coordinates for both kinds of datasets, and the one showing the best trade-off between sensitivity and precision. Applied to 106 ENCODE human RNA-seq datasets, ChimPipe identified 137 high confidence chimeras connecting the protein coding sequence of their parent genes. In subsequent experiments, three out of four predicted chimeras, two of which recurrently expressed in a large majority of the samples, could be validated. Cloning and sequencing of the three cases revealed several new chimeric transcript structures, 3 of which with the potential to encode a chimeric protein for which we hypothesized a new role. Applying ChimPipe to human and mouse ENCODE RNA-seq data led to the identification of 131 recurrent chimeras common to both species, and therefore potentially conserved. Conclusions: ChimPipe combines discordant paired-end reads and split-reads to detect any kind of chimeras, including those originating from polymerase read-through, and shows an excellent trade-off between sensitivity and precision. The chimeras found by ChimPipe can be validated in-vitro with high accuracy.
引用
收藏
页数:17
相关论文
共 57 条
  • [1] Transcription-mediated gene fusion in the human genome
    Akiva, P
    Toporik, A
    Edelheit, S
    Peretz, Y
    Diber, A
    Shemesh, R
    Novik, A
    Sorek, R
    [J]. GENOME RESEARCH, 2006, 16 (01) : 30 - 36
  • [2] Recurrent chimeric fusion RNAs in non-cancer tissues and cells
    Babiceanu, Mihaela
    Qin, Fujun
    Xie, Zhongqiu
    Jia, Yuemeng
    Lopez, Kevin
    Janus, Nick
    Facemire, Loryn
    Kumar, Shailesh
    Pang, Yuwei
    Qi, Yanjun
    Lazar, Iulia M.
    Li, Hui
    [J]. NUCLEIC ACIDS RESEARCH, 2016, 44 (06) : 2859 - 2872
  • [3] Beccuti M., 2013, OA Bioinformatics, V1, P2
  • [4] Discovering chimeric transcripts in paired-end RNA-seq data by using EricScript
    Benelli, Matteo
    Pescucci, Chiara
    Marseglia, Giuseppina
    Severgnini, Marco
    Torricelli, Francesca
    Magi, Alberto
    [J]. BIOINFORMATICS, 2012, 28 (24) : 3232 - 3239
  • [5] Integrative analysis of the melanoma transcriptome
    Berger, Michael F.
    Levin, Joshua Z.
    Vijayendran, Krishna
    Sivachenko, Andrey
    Adiconis, Xian
    Maguire, Jared
    Johnson, Laura A.
    Robinson, James
    Verhaak, Roel G.
    Sougnez, Carrie
    Onofrio, Robert C.
    Ziaugra, Liuda
    Cibulskis, Kristian
    Laine, Elisabeth
    Barretina, Jordi
    Winckler, Wendy
    Fisher, David E.
    Getz, Gad
    Meyerson, Matthew
    Jaffe, David B.
    Gabriel, Stacey B.
    Lander, Eric S.
    Dummer, Reinhard
    Gnirke, Andreas
    Nusbaum, Chad
    Garraway, Levi A.
    [J]. GENOME RESEARCH, 2010, 20 (04) : 413 - 427
  • [6] Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project
    Birney, Ewan
    Stamatoyannopoulos, John A.
    Dutta, Anindya
    Guigo, Roderic
    Gingeras, Thomas R.
    Margulies, Elliott H.
    Weng, Zhiping
    Snyder, Michael
    Dermitzakis, Emmanouil T.
    Stamatoyannopoulos, John A.
    Thurman, Robert E.
    Kuehn, Michael S.
    Taylor, Christopher M.
    Neph, Shane
    Koch, Christoph M.
    Asthana, Saurabh
    Malhotra, Ankit
    Adzhubei, Ivan
    Greenbaum, Jason A.
    Andrews, Robert M.
    Flicek, Paul
    Boyle, Patrick J.
    Cao, Hua
    Carter, Nigel P.
    Clelland, Gayle K.
    Davis, Sean
    Day, Nathan
    Dhami, Pawandeep
    Dillon, Shane C.
    Dorschner, Michael O.
    Fiegler, Heike
    Giresi, Paul G.
    Goldy, Jeff
    Hawrylycz, Michael
    Haydock, Andrew
    Humbert, Richard
    James, Keith D.
    Johnson, Brett E.
    Johnson, Ericka M.
    Frum, Tristan T.
    Rosenzweig, Elizabeth R.
    Karnani, Neerja
    Lee, Kirsten
    Lefebvre, Gregory C.
    Navas, Patrick A.
    Neri, Fidencio
    Parker, Stephen C. J.
    Sabo, Peter J.
    Sandstrom, Richard
    Shafer, Anthony
    [J]. NATURE, 2007, 447 (7146) : 799 - 816
  • [7] Nonsense-mediated mRNA decay (NMD) mechanisms
    Brogna, Saverio
    Wen, Jikai
    [J]. NATURE STRUCTURAL & MOLECULAR BIOLOGY, 2009, 16 (02) : 107 - 113
  • [8] State of art fusion-finder algorithms are suitable to detect transcription-induced chimeras in normal tissues?
    Carrara, Matteo
    Beccuti, Marco
    Cavallo, Federica
    Donatelli, Susanna
    Lazzarato, Fulvio
    Cordero, Francesca
    Calogero, Raffaele A.
    [J]. BMC BIOINFORMATICS, 2013, 14
  • [9] State-of-the-Art Fusion-Finder Algorithms Sensitivity and Specificity
    Carrara, Matteo
    Beccuti, Marco
    Lazzarato, Fulvio
    Cavallo, Federica
    Cordero, Francesca
    Donatelli, Susanna
    Calogero, Raffaele A.
    [J]. BIOMED RESEARCH INTERNATIONAL, 2013, 2013
  • [10] Reverse transcriptase template switching and false alternative transcripts
    Cocquet, Julie
    Chong, Allen
    Zhang, Guanglan
    Veitia, Reiner A.
    [J]. GENOMICS, 2006, 88 (01) : 127 - 131