Freddie: annotation-independent detection and discovery of transcriptomic alternative splicing isoforms using long-read sequencing

被引:7
|
作者
Orabi, Baraa [1 ]
Xie, Ning [2 ]
McConeghy, Brian [2 ]
Dong, Xuesen [2 ,3 ]
Chauve, Cedric [4 ]
Hach, Faraz [1 ,2 ,3 ]
机构
[1] Univ British Columbia, Dept Comp Sci, Vancouver, BC, Canada
[2] Vancouver Prostate Ctr, Dept Math, Vancouver, BC, Canada
[3] Univ British Columbia, Dept Urol Sci, Vancouver, BC, Canada
[4] Simon Fraser Univ, Dept Math, Burnaby, BC, Canada
基金
加拿大健康研究院; 加拿大自然科学与工程研究理事会;
关键词
RNA; RECONSTRUCTION; QUANTIFICATION;
D O I
10.1093/nar/gkac1112
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Alternative splicing (AS) is an important mechanism in the development of many cancers, as novel or aberrant AS patterns play an important role as an independent onco-driver. In addition, cancer-specific AS is potentially an effective target of personalized cancer therapeutics. However, detecting AS events remains a challenging task, especially if these AS events are novel. This is exacerbated by the fact that existing transcriptome annotation databases are far from being comprehensive, especially with regard to cancer-specific AS. Additionally, traditional sequencing technologies are severely limited by the short length of the generated reads, which rarely spans more than a single splice junction site. Given these challenges, transcriptomic long-read (LR) sequencing presents a promising potential for the detection and discovery of AS. We present Freddie, a computational annotation-independent isoform discovery and detection tool. Freddie takes as input transcriptomic LR sequencing of a sample alongside its genomic split alignment and computes a set of isoforms for the given sample. It then partitions the input reads into sets that can be processed independently and in parallel. For each partition, Freddie segments the genomic alignment of the reads into canonical exon segments. The goal of this segmentation is to be able to represent any potential isoform as a subset of these canonical exons. This segmentation is formulated as an optimization problem and is solved with a dynamic programming algorithm. Then, Freddie reconstructs the isoforms by jointly clustering and error-correcting the reads using the canonical segmentation as a succinct representation. The clustering and error-correcting step is formulated as an optimization problem-the Minimum Error Clustering into Isoforms (MErCi) problem-and is solved using integer linear programming (ILP). We compare the performance of Freddie on simulated datasets with other isoform detection tools with varying dependence on annotation databases. We show that Freddie outperforms the other tools in its accuracy, including those given the complete ground truth annotation. We also run Freddie on a transcriptomic LR dataset generated in-house from a prostate cancer cell line with a matched short-read RNA-seq dataset. Freddie results in isoforms with a higher short-read cross-validation rate than the other tested tools. Freddie is open source and available at https://github.com/vpc-ccg/freddie/.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] A global survey of alternative splicing of HBV transcriptome using long-read sequencing
    Guan, Guiwen
    Zou, Jun
    Zhang, Ting
    Lu, Fengmin
    Chen, Xiangmei
    JOURNAL OF HEPATOLOGY, 2022, 76 (01) : 234 - 236
  • [2] LoRID: a bioinformatic pipeline to discover alternative isoforms using nanopore long-read sequencing
    Soirat, Nicolas
    Aucouturier, Camille
    Philippe, Nicolas
    Vaur, Dominique
    Bertrand, Denis
    Raphael, Leman
    Krieger, Sophie
    Castera, Laurent
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2024, 32 : 669 - 670
  • [3] Reply to: "A global survey of alternative splicing of HBV transcriptome using long-read sequencing"
    Yuan, Shilin
    Yang, Yuedong
    Hu, Ronggui
    JOURNAL OF HEPATOLOGY, 2022, 76 (01) : 236 - 237
  • [4] High resolution annotation of zebrafish transcriptome using long-read sequencing
    Nudelman, German
    Frasca, Antonio
    Kent, Brandon
    Sadler, Kirsten C.
    Sealfon, Stuart C.
    Walsh, Martin J.
    Zaslavsky, Elena
    GENOME RESEARCH, 2018, 28 (09) : 1415 - 1425
  • [5] Cartography of neurexin alternative splicing mapped by single-molecule long-read mRNA sequencing
    Treutlein, Barbara
    Gokce, Ozgun
    Quake, Stephen R.
    Suedhof, Thomas C.
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2014, 111 (13) : E1291 - E1299
  • [6] Multiplexed Assembly and Annotation of Synthetic Biology Constructs Using Long-Read Nanopore Sequencing
    Emiliani, Francesco E.
    Hsu, Ian
    McKenna, Aaron
    ACS SYNTHETIC BIOLOGY, 2022, 11 (07): : 2238 - 2246
  • [7] Long-read sequencing reveals the landscape of aberrant alternative splicing and novel therapeutic target in colorectal cancer
    Sun, Qiang
    Han, Ye
    He, Jianxing
    Wang, Jie
    Ma, Xuejie
    Ning, Qianqian
    Zhao, Qing
    Jin, Qian
    Yang, Lili
    Li, Shuang
    Li, Yang
    Zhi, Qiaoming
    Zheng, Junnian
    Dong, Dong
    GENOME MEDICINE, 2023, 15 (01)
  • [8] Alternative splicing in head and neck squamous cell carcinoma: public database exploration and long-read sequencing
    Abe, Tatsuya
    Ling, Yiwei
    Okuda, Shujiro
    Yamazaki, Manabu
    Maruyama, Satoshi
    Tanuma, Junichi
    CANCER SCIENCE, 2024, 115 : 1083 - 1083
  • [9] Long-read sequencing reveals the landscape of aberrant alternative splicing and novel therapeutic target in colorectal cancer
    Qiang Sun
    Ye Han
    Jianxing He
    Jie Wang
    Xuejie Ma
    Qianqian Ning
    Qing Zhao
    Qian Jin
    Lili Yang
    Shuang Li
    Yang Li
    Qiaoming Zhi
    Junnian Zheng
    Dong Dong
    Genome Medicine, 15
  • [10] High-resolution annotation of the mouse preimplantation embryo transcriptome using long-read sequencing
    Qiao, Yunbo
    Ren, Chao
    Huang, Shisheng
    Yuan, Jie
    Liu, Xingchen
    Fan, Jiao
    Lin, Jianxiang
    Wu, Susu
    Chen, Qiuzhen
    Bo, Xiaochen
    Li, Xiangyang
    Huang, Xingxu
    Liu, Zhen
    Shu, Wenjie
    NATURE COMMUNICATIONS, 2020, 11 (01)