Benchmarking long-read RNA-sequencing analysis tools using in silico mixtures

被引:22
作者
Dong, Xueyi [1 ,2 ]
Du, Mei R. M. [1 ]
Gouil, Quentin [1 ,2 ]
Tian, Luyi [1 ,2 ,4 ]
Jabbari, Jafar S. [1 ,2 ]
Bowden, Rory [1 ,2 ]
Baldoni, Pedro L. [1 ,2 ]
Chen, Yunshun [1 ,2 ]
Smyth, Gordon K. [1 ,3 ]
Amarasinghe, Shanika L. [1 ,2 ,5 ]
Law, Charity W. [1 ,2 ]
Ritchie, Matthew E. [1 ,2 ]
机构
[1] Walter & Eliza Hall Inst Med Res, Parkville, Vic, Australia
[2] Univ Melbourne, Dept Med Biol, Parkville, Vic, Australia
[3] Univ Melbourne, Sch Math & Stat, Parkville, Vic, Australia
[4] Guangzhou Natl Lab, Guangzhou, Peoples R China
[5] Monash Univ, Australian Regenerat Med Inst, Clayton, Vic, Australia
基金
英国医学研究理事会;
关键词
QUALITY-CONTROL; R PACKAGE; QUANTIFICATION;
D O I
10.1038/s41592-023-02026-3
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The lack of benchmark data sets with inbuilt ground-truth makes it challenging to compare the performance of existing long-read isoform detection and differential expression analysis workflows. Here, we present a benchmark experiment using two human lung adenocarcinoma cell lines that were each profiled in triplicate together with synthetic, spliced, spike-in RNAs (sequins). Samples were deeply sequenced on both Illumina short-read and Oxford Nanopore Technologies long-read platforms. Alongside the ground-truth available via the sequins, we created in silico mixture samples to allow performance assessment in the absence of true positives or true negatives. Our results show that StringTie2 and bambu outperformed other tools from the six isoform detection tools tested, DESeq2, edgeR and limma-voom were best among the five differential transcript expression tools tested and there was no clear front-runner for performing differential transcript usage analysis between the five tools compared, which suggests further methods development is needed for this application.
引用
收藏
页码:1810 / 1821
页数:18
相关论文
共 58 条
[1]   Opportunities and challenges in long-read sequencing data analysis [J].
Amarasinghe, Shanika L. ;
Su, Shian ;
Dong, Xueyi ;
Zappia, Luke ;
Ritchie, Matthew E. ;
Gouil, Quentin .
GENOME BIOLOGY, 2020, 21 (01)
[2]   Differential expression analysis for sequence count data [J].
Anders, Simon ;
Huber, Wolfgang .
GENOME BIOLOGY, 2010, 11 (10)
[3]   Detecting differential usage of exons from RNA-seq data [J].
Anders, Simon ;
Reyes, Alejandro ;
Huber, Wolfgang .
GENOME RESEARCH, 2012, 22 (10) :2008-2017
[4]   Nanopore long-read RNAseq reveals widespread transcriptional variation among the surface receptors of individual B cells [J].
Byrne, Ashley ;
Beaudin, Anna E. ;
Olsen, Hugh E. ;
Jain, Miten ;
Cole, Charles ;
Palmer, Theron ;
DuBois, Rebecca M. ;
Forsberg, E. Camilla ;
Akeson, Mark ;
Vollmers, Christopher .
NATURE COMMUNICATIONS, 2017, 8
[5]   Long-Read RNA Sequencing Identifies Polyadenylation Elongation and Differential Transcript Usage of Host Transcripts During SARS-CoV-2 In Vitro Infection [J].
Chang, Jessie J. -Y. ;
Gleeson, Josie ;
Rawlinson, Daniel ;
De Paoli-Iseppi, Ricardo ;
Zhou, Chenxi ;
Mordant, Francesca L. ;
Londrigan, Sarah L. ;
Clark, Michael B. ;
Subbarao, Kanta ;
Stinear, Timothy P. ;
Coin, Lachlan J. M. ;
Pitt, Miranda E. .
FRONTIERS IN IMMUNOLOGY, 2022, 13
[6]  
Chen Y, 2022, bioRxiv, DOI [10.1101/2022.11.14.516358, 10.1101/2022.11.14.516358, DOI 10.1101/2022.11.14.516358]
[7]  
Chen Y, 2021, bioRxiv, DOI [10.1101/2021.04.21.440736, 10.1101/2021.04.21.440736, DOI 10.1101/2021.04.21.440736]
[8]  
Chen Yunshun, 2016, F1000Res, V5, P1438, DOI 10.12688/f1000research.8987.2
[9]   Complete characterization of the human immune cell transcriptome using accurate full-length cDNA sequencing [J].
Cole, Charles ;
Byrne, Ashley ;
Adams, Matthew ;
Volden, Roger ;
Vollmers, Christopher .
GENOME RESEARCH, 2020, 30 (04) :589-601
[10]   UpSetR: an R package for the visualization of intersecting sets and their properties [J].
Conway, Jake R. ;
Lex, Alexander ;
Gehlenborg, Nils .
BIOINFORMATICS, 2017, 33 (18) :2938-2940