LongGF: computational algorithm and software tool for fast and accurate detection of gene fusions by long-read transcriptome sequencing

被引:34
作者
Liu, Qian [1 ]
Hu, Yu [1 ]
Stucky, Andres [2 ]
Fang, Li [1 ]
Zhong, Jiang F. [2 ]
Wang, Kai [1 ,3 ]
机构
[1] Childrens Hosp Philadelphia, Raymond G Perelman Ctr Cellular & Mol Therapeut, Philadelphia, PA 19104 USA
[2] Univ Southern Calif, Keck Sch Med, Dept Otolaryngol, Los Angeles, CA 90033 USA
[3] Univ Penn, Perelman Sch Med, Dept Pathol & Lab Med, Philadelphia, PA 19104 USA
关键词
Gene fusion; Long-read sequencing; Transcriptome sequencing; Computational tool; RNA-SEQ; CANCER; EXPRESSION; QUANTIFICATION; IDENTIFICATION; TRANSLOCATION; BIOMARKERS; RECEPTORS; DISCOVERY; TARGET;
D O I
10.1186/s12864-020-07207-4
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
BackgroundLong-read RNA-Seq techniques can generate reads that encompass a large proportion or the entire mRNA/cDNA molecules, so they are expected to address inherited limitations of short-read RNA-Seq techniques that typically generate <150bp reads. However, there is a general lack of software tools for gene fusion detection from long-read RNA-seq data, which takes into account the high basecalling error rates and the presence of alignment errors.ResultsIn this study, we developed a fast computational tool, LongGF, to efficiently detect candidate gene fusions from long-read RNA-seq data, including cDNA sequencing data and direct mRNA sequencing data. We evaluated LongGF on tens of simulated long-read RNA-seq datasets, and demonstrated its superior performance in gene fusion detection. We also tested LongGF on a Nanopore direct mRNA sequencing dataset and a PacBio sequencing dataset generated on a mixture of 10 cancer cell lines, and found that LongGF achieved better performance to detect known gene fusions over existing computational tools. Furthermore, we tested LongGF on a Nanopore cDNA sequencing dataset on acute myeloid leukemia, and pinpointed the exact location of a translocation (previously known in cytogenetic resolution) in base resolution, which was further validated by Sanger sequencing.ConclusionsIn summary, LongGF will greatly facilitate the discovery of candidate gene fusion events from long-read RNA-Seq data, especially in cancer samples. LongGF is implemented in C++ and is available at https://github.com/WGLab/LongGF.
引用
收藏
页数:12
相关论文
共 69 条
[1]   STAR Chimeric Post for rapid detection of circular RNA and fusion transcripts [J].
Akers, Nicholas K. ;
Schadt, Eric E. ;
Losic, Bojan .
BIOINFORMATICS, 2018, 34 (14) :2364-2370
[2]  
[Anonymous], 1960, Science, V132, P1488
[3]  
Ariazi EA, 2002, CANCER RES, V62, P6510
[4]   A novel bioinformatics pipeline for identification and characterization of fusion transcripts in breast cancer and normal cell lines [J].
Asmann, Yan W. ;
Hossain, Asif ;
Necela, Brian M. ;
Middha, Sumit ;
Kalari, Krishna R. ;
Sun, Zhifu ;
Chai, High-Seng ;
Williamson, David W. ;
Radisky, Derek ;
Schroth, Gary P. ;
Kocher, Jean-Pierre A. ;
Perez, Edith A. ;
Thompson, E. Aubrey .
NUCLEIC ACIDS RESEARCH, 2011, 39 (15) :e100
[5]   Discovering chimeric transcripts in paired-end RNA-seq data by using EricScript [J].
Benelli, Matteo ;
Pescucci, Chiara ;
Marseglia, Giuseppina ;
Severgnini, Marco ;
Torricelli, Francesca ;
Magi, Alberto .
BIOINFORMATICS, 2012, 28 (24) :3232-3239
[6]   Nanopore long-read RNAseq reveals widespread transcriptional variation among the surface receptors of individual B cells [J].
Byrne, Ashley ;
Beaudin, Anna E. ;
Olsen, Hugh E. ;
Jain, Miten ;
Cole, Charles ;
Palmer, Theron ;
DuBois, Rebecca M. ;
Forsberg, E. Camilla ;
Akeson, Mark ;
Vollmers, Christopher .
NATURE COMMUNICATIONS, 2017, 8
[7]   BreakFusion: targeted assembly-based identification of gene fusions in whole transcriptome paired-end sequencing data [J].
Chen, Ken ;
Wallis, John W. ;
Kandoth, Cyriac ;
Kalicki-Veizer, Joelle M. ;
Mungall, Karen L. ;
Mungall, Andrew J. ;
Jones, Steven J. ;
Marra, Marco A. ;
Ley, Timothy J. ;
Mardis, Elaine R. ;
Wilson, Richard K. ;
Weinstein, John N. ;
Ding, Li .
BIOINFORMATICS, 2012, 28 (14) :1923-1924
[8]   Fusion-Bloom: fusion detection in assembled transcriptomes [J].
Chiu, Readman ;
Nip, Ka Ming ;
Birol, Inanc .
BIOINFORMATICS, 2020, 36 (07) :2256-2257
[9]   JAFFA: High sensitivity transcriptome-focused fusion gene detection [J].
Davidson, Nadia M. ;
Majewski, Ian J. ;
Oshlack, Alicia .
GENOME MEDICINE, 2015, 7
[10]   Fusion genes and chromosome translocations in the common epithelial cancers [J].
Edwards, Paul A. W. .
JOURNAL OF PATHOLOGY, 2010, 220 (02) :244-254