SimBA: A methodology and tools for evaluating the performance of RNA-Seq bioinformatic pipelines

被引：6

作者：

Audoux, Jerome ^{[1
,2
]}

Salson, Mikael ^{[3
]}

Grosset, Christophe F. ^{[4
]}

Beaumeunier, Sacha ^{[1
,2
]}

Holder, Jean-Marc ^{[1
,2
]}

Commes, Therese ^{[1
,2
]}

Philippe, Nicolas ^{[1
,2
]}

机构：

[1] CHR Montpellier, Hop St Eloi, IRMB, SeqOne, 80 Ave Augustin Fliche, F-34295 Montpellier, France

[2] Inst Computat Biol, 860 Rue St Priest, F-34095 Montpellier 5, France

[3] Univ Lille, CNRS, CRIStAL Ctr Rech Informat Signal & Automat Lille, INRIA,Cent Lille,UMR 9189, F-59000 Lille, France

[4] Univ Bordeaux, INSERM, BMGIC, U1035, F-33076 Bordeaux, France

来源：

BMC BIOINFORMATICS | 2017年 / 18卷

关键词：

RNA-Seq; Transcriptomics; Benchmark; Pipeline optimization; FRAMEWORK; ALIGNMENT; DISCOVERY; BENCHMARK;

D O I：

10.1186/s12859-017-1831-5

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Background: The evolution of next-generation sequencing (NGS) technologies has led to increased focus on RNA-Seq. Many bioinformatic tools have been developed for RNA-Seq analysis, each with unique performance characteristics and configuration parameters. Users face an increasingly complex task in understanding which bioinformatic tools are best for their specific needs and how they should be configured. In order to provide some answers to these questions, we investigate the performance of leading bioinformatic tools designed for RNA-Seq analysis and propose a methodology for systematic evaluation and comparison of performance to help users make well informed choices. Results: To evaluate RNA-Seq pipelines, we developed a suite of two benchmarking tools. SimCT generates simulated datasets that get as close as possible to specific real biological conditions accompanied by the list of genomic incidents and mutations that have been inserted. BenchCT then compares the output of any bioinformatics pipeline that has been run against a SimCT dataset with the simulated genomic and transcriptional variations it contains to give an accurate performance evaluation in addressing specific biological question. We used these tools to simulate a real-world genomic medicine question s involving the comparison of healthy and cancerous cells. Results revealed that performance in addressing a particular biological context varied significantly depending on the choice of tools and settings used. We also found that by combining the output of certain pipelines, substantial performance improvements could be achieved. Conclusion: Our research emphasizes the importance of selecting and configuring bioinformatic tools for the specific biological question being investigated to obtain optimal results. Pipeline designers, developers and users should include benchmarking in the context of their biological question as part of their design and quality control process. Our SimBA suite of benchmarking tools provides a reliable basis for comparing the performance of RNA-Seq bioinformatics pipelines in addressing a specific biological question. We would like to see the creation of a reference corpus of data-sets that would allow accurate comparison between benchmarks performed by different groups and the publication of more benchmarks based on this public corpus. SimBA software and data-set are available at http://cractools.gforge.inria.fr/softwares/simba/.

引用

页数：14

共 50 条

[31] A comprehensive benchmarking of differential splicing tools for RNA-seq analysis at the event level
Jiang, Minghao
Zhang, Shiyan
Yin, Hongxin
Zhuo, Zhiyi
Meng, Guoyu
BRIEFINGS IN BIOINFORMATICS, 2023, 24 (03)
[32] Evaluating whole transcriptome amplification for gene profiling experiments using RNA-Seq
Sheena L Faherty
C Ryan Campbell
Peter A Larsen
Anne D Yoder
BMC Biotechnology, 15
[33] Robustness and applicability of transcription factor and pathway analysis tools on single-cell RNA-seq data
Christian H. Holland
Jovan Tanevski
Javier Perales-Patón
Jan Gleixner
Manu P. Kumar
Elisabetta Mereu
Brian A. Joughin
Oliver Stegle
Douglas A. Lauffenburger
Holger Heyn
Bence Szalai
Julio Saez-Rodriguez
Genome Biology, 21
[34] Robustness and applicability of transcription factor and pathway analysis tools on single-cell RNA-seq data
Holland, Christian H.
Tanevski, Jovan
Perales-Paton, Javier
Gleixner, Jan
Kumar, Manu P.
Mereu, Elisabetta
Joughin, Brian A.
Stegle, Oliver
Lauffenburger, Douglas A.
Heyn, Holger
Szalai, Bence
Saez-Rodriguez, Julio
GENOME BIOLOGY, 2020, 21 (01)
[35] FastqPuri: high-performance preprocessing of RNA-seq data
Perez-Rubio, Paula
Lottaz, Claudio
Engelmann, Julia C.
BMC BIOINFORMATICS, 2019, 20 (1)
[36] Evaluating whole transcriptome amplification for gene profiling experiments using RNA-Seq
Faherty, Sheena L.
Campbell, C. Ryan
Larsen, Peter A.
Yoder, Anne D.
BMC BIOTECHNOLOGY, 2015, 15
[37] Comparing HISAT and STAR-based pipelines for RNA-Seq Data Analysis: a real experience
Bianchi, Andrea
Di Marco, Antinisca
Pellegrini, Cristina
2023 IEEE 36TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, CBMS, 2023, : 218 - 224
[38] Gene set analysis approaches for RNA-seq data: performance evaluation and application guideline
Rahmatallah, Yasir
Emmert-Streib, Frank
Glazko, Galina
BRIEFINGS IN BIOINFORMATICS, 2016, 17 (03) : 393 - 407
[39] cdev: a ground-truth based measure to evaluate RNA-seq normalization performance
Tran, Diem-Trang
Might, Matthew
PEERJ, 2021, 9
[40] BOAssembler: A Bayesian Optimization Framework to Improve RNA-Seq Assembly Performance
Mao, Shunfu
Jiang, Yihan
Mathew, Edwin Basil
Kannan, Sreeram
ALGORITHMS FOR COMPUTATIONAL BIOLOGY (ALCOB 2020), 2020, 12099 : 188 - 197

← 1 2 3 4 5 →