TAGET: a toolkit for analyzing full-length transcripts from long-read sequencing

被引:5
|
作者
Xia, Yuchao [1 ,2 ]
Jin, Zijie [3 ,4 ]
Zhang, Chengsheng [2 ]
Ouyang, Linkun [5 ]
Dong, Yuhao [2 ]
Li, Juan [6 ]
Guo, Lvze [2 ]
Jing, Biyang [2 ]
Shi, Yang [7 ]
Miao, Susheng [8 ]
Xi, Ruibin [4 ,5 ,9 ]
机构
[1] Beijing Informat Sci & Technol Univ, Coll Sci, Beijing 100192, Peoples R China
[2] Beijing Genex Hlth Technol Co Ltd, Beijing 100195, Peoples R China
[3] Peking Univ, Peking Univ Int Canc Inst, Hlth Sci Ctr, Beijing 100191, Peoples R China
[4] Peking Univ, Sch Math Sci, Beijing 100871, Peoples R China
[5] Peking Univ, Acad Adv Interdisciplinary Studies, Beijing 100871, Peoples R China
[6] Peking Univ, Coll Future Technol, Dept Biomed Engn, Beijing 100871, Peoples R China
[7] BeiGene Beijing Co Ltd, Beijing, Peoples R China
[8] Harbin Med Univ Canc Hosp, Dept Head & Neck Surg, Harbin 150081, Peoples R China
[9] Peking Univ, Ctr Stat Sci, Beijing 100871, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
RNA; CBL; CANCER; ALIGNMENT; ONCOGENE; PROTEIN; HISAT; CELL;
D O I
10.1038/s41467-023-41649-0
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Single-molecule Real-time Isoform Sequencing (Iso-seq) of transcriptomes by PacBio can generate very long and accurate reads, thus providing an ideal platform for full-length transcriptome analysis. We present an integrated computational toolkit named TAGET for Iso-seq full-length transcript data analyses, including transcript alignment, annotation, gene fusion detection, and quantification analyses such as differential expression gene analysis and differential isoform usage analysis. We evaluate the performance of TAGET using a public Iso-seq dataset and newly sequenced Iso-seq datasets from tumor patients. TAGET gives significantly more precise novel splice site prediction and enables more accurate novel isoform and gene fusion discoveries, as validated by experimental validations and comparisons with RNA-seq data. We identify and experimentally validate a differential isoform usage gene ECM1, and further show that its isoform ECM1b may be a tumor-suppressor in laryngocarcinoma. Our results demonstrate that TAGET provides a valuable computational toolkit and can be applied to many full-length transcriptome studies. Accurate long-read RNA sequencing facilitates analysis of full-length transcripts. Here the authors develop an integrative toolkit, optimised for Iso-Seq data analysis, that includes transcript alignment, annotation, quantification and gene fusion detection.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] TAGET: a toolkit for analyzing full-length transcripts from long-read sequencing
    Yuchao Xia
    Zijie Jin
    Chengsheng Zhang
    Linkun Ouyang
    Yuhao Dong
    Juan Li
    Lvze Guo
    Biyang Jing
    Yang Shi
    Susheng Miao
    Ruibin Xi
    Nature Communications, 14
  • [2] Long-read sequencing of the coffee bean transcriptome reveals the diversity of full-length transcripts
    Cheng, Bing
    Furtado, Agnelo
    Henry, Robert J.
    GIGASCIENCE, 2017, 6 (11):
  • [3] Click Chemistry Enables Rapid Amplification of Full-Length Reverse Transcripts for Long-Read Third Generation Sequencing
    Schoenegger, Eva S.
    Crisp, Antony
    Mueller, Markus
    Fertl, Jessica
    Serdjukow, Sascha
    Croce, Stefano
    Kollaschinski, Michael
    Carell, Thomas
    Frischmuth, Thomas
    BIOCONJUGATE CHEMISTRY, 2022, 33 (10) : 1789 - 1795
  • [4] Single-Molecule Long-Read Sequencing Reveals the Diversity of Full-Length Transcripts in Leaves of Gnetum (Gnetales)
    Deng, Nan
    Hou, Chen
    Ma, Fengfeng
    Liu, Caixia
    Tian, Yuxin
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2019, 20 (24)
  • [5] PacBio Single-Molecule Long-Read Sequencing Provides New Light on the Complexity of Full-Length Transcripts in Cattle
    Chang, Tianpeng
    An, Bingxing
    Liang, Mang
    Duan, Xinghai
    Du, Lili
    Cai, Wentao
    Zhu, Bo
    Gao, Xue
    Chen, Yan
    Xu, Lingyang
    Zhang, Lupei
    Gao, Huijiang
    Li, Junya
    FRONTIERS IN GENETICS, 2021, 12
  • [6] Generation of full-length circular RNA libraries for Oxford Nanopore long-read sequencing
    Fuchs, Steffen
    Babin, Loelia
    Andraos, Elissa
    Bessiere, Chloe
    Willier, Semjon
    Schulte, Johannes H.
    Gaspin, Christine
    Meggetto, Fabienne
    PLOS ONE, 2022, 17 (09):
  • [7] Identification of full-length circular nucleic acids using long-read sequencing technologies
    Lu, Wenxiang
    Yu, Kequan
    Li, Xiaohan
    Ge, Qinyu
    Liang, Geyu
    Bai, Yunfei
    ANALYST, 2021, 146 (20) : 6102 - 6113
  • [8] High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing
    Julien Lagarde
    Barbara Uszczynska-Ratajczak
    Silvia Carbonell
    Sílvia Pérez-Lluch
    Amaya Abad
    Carrie Davis
    Thomas R Gingeras
    Adam Frankish
    Jennifer Harrow
    Roderic Guigo
    Rory Johnson
    Nature Genetics, 2017, 49 : 1731 - 1740
  • [9] High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing
    Lagarde, Julien
    Uszczynska-Ratajczak, Barbara
    Carbonell, Silvia
    Perez-Lluch, Silvia
    Abad, Amaya
    Davis, Carrie
    Gingeras, Thomas R.
    Frankish, Adam
    Harrow, Jennifer
    Guigo, Roderic
    Johnson, Rory
    NATURE GENETICS, 2017, 49 (12) : 1731 - +
  • [10] Full-Length Immune Repertoire Reconstruction and Profiling at the Transcriptome Level Using Long-Read Sequencing
    Luo, Xuanmei
    Zhang, Lili
    Li, Yifei
    Li, Chang
    Sun, Gaoyuan
    Zhang, Chunli
    Fu, Yu
    Lv, Haozhen
    Liu, Ming
    Cui, Hongyuan
    Cai, Dali
    Zou, Lihui
    Ma, Jie
    Xiao, Fei
    CLINICAL CHEMISTRY, 2024, 71 (02) : 274 - 285