SimBA: A methodology and tools for evaluating the performance of RNA-Seq bioinformatic pipelines

被引:6
作者
Audoux, Jerome [1 ,2 ]
Salson, Mikael [3 ]
Grosset, Christophe F. [4 ]
Beaumeunier, Sacha [1 ,2 ]
Holder, Jean-Marc [1 ,2 ]
Commes, Therese [1 ,2 ]
Philippe, Nicolas [1 ,2 ]
机构
[1] CHR Montpellier, Hop St Eloi, IRMB, SeqOne, 80 Ave Augustin Fliche, F-34295 Montpellier, France
[2] Inst Computat Biol, 860 Rue St Priest, F-34095 Montpellier 5, France
[3] Univ Lille, CNRS, CRIStAL Ctr Rech Informat Signal & Automat Lille, INRIA,Cent Lille,UMR 9189, F-59000 Lille, France
[4] Univ Bordeaux, INSERM, BMGIC, U1035, F-33076 Bordeaux, France
来源
BMC BIOINFORMATICS | 2017年 / 18卷
关键词
RNA-Seq; Transcriptomics; Benchmark; Pipeline optimization; FRAMEWORK; ALIGNMENT; DISCOVERY; BENCHMARK;
D O I
10.1186/s12859-017-1831-5
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The evolution of next-generation sequencing (NGS) technologies has led to increased focus on RNA-Seq. Many bioinformatic tools have been developed for RNA-Seq analysis, each with unique performance characteristics and configuration parameters. Users face an increasingly complex task in understanding which bioinformatic tools are best for their specific needs and how they should be configured. In order to provide some answers to these questions, we investigate the performance of leading bioinformatic tools designed for RNA-Seq analysis and propose a methodology for systematic evaluation and comparison of performance to help users make well informed choices. Results: To evaluate RNA-Seq pipelines, we developed a suite of two benchmarking tools. SimCT generates simulated datasets that get as close as possible to specific real biological conditions accompanied by the list of genomic incidents and mutations that have been inserted. BenchCT then compares the output of any bioinformatics pipeline that has been run against a SimCT dataset with the simulated genomic and transcriptional variations it contains to give an accurate performance evaluation in addressing specific biological question. We used these tools to simulate a real-world genomic medicine question s involving the comparison of healthy and cancerous cells. Results revealed that performance in addressing a particular biological context varied significantly depending on the choice of tools and settings used. We also found that by combining the output of certain pipelines, substantial performance improvements could be achieved. Conclusion: Our research emphasizes the importance of selecting and configuring bioinformatic tools for the specific biological question being investigated to obtain optimal results. Pipeline designers, developers and users should include benchmarking in the context of their biological question as part of their design and quality control process. Our SimBA suite of benchmarking tools provides a reliable basis for comparing the performance of RNA-Seq bioinformatics pipelines in addressing a specific biological question. We would like to see the creation of a reference corpus of data-sets that would allow accurate comparison between benchmarks performed by different groups and the publication of more benchmarks based on this public corpus. SimBA software and data-set are available at http://cractools.gforge.inria.fr/softwares/simba/.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Using Synthetic Mouse Spike-In Transcripts to Evaluate RNA-Seq Analysis Tools
    Leshkowitz, Dena
    Feldmesser, Ester
    Friedlander, Gilgi
    Jona, Ghil
    Ainbinder, Elena
    Parmet, Yisrael
    Horn-Saban, Shirley
    PLOS ONE, 2016, 11 (04):
  • [42] Rule-based integration of RNA-Seq analyses tools for identification of novel transcripts
    Inamdar, Harshal
    Datta, Avik
    Sunitha, Manjari K.
    Joshi, Rajendra
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2014, 12 (05)
  • [43] Parallel Performance and I/O Profiling of HPC RNA-Seq Applications
    Cruz, Lucas
    Coelho, Micaella
    Galheigo, Marcelo
    Carneiro, Andre
    Carvalho, Diego
    Gadelha, Luiz
    Boito, Francieli
    Navaux, Philippe
    Osthoff, Carla
    Ocana, Kary
    COMPUTACION Y SISTEMAS, 2022, 26 (04): : 1625 - 1633
  • [44] Performance evaluation of lossy quality compression algorithms for RNA-seq data
    Rongshan Yu
    Wenxian Yang
    Shun Wang
    BMC Bioinformatics, 21
  • [45] Performance evaluation of lossy quality compression algorithms for RNA-seq data
    Yu, Rongshan
    Yang, Wenxian
    Wang, Shun
    BMC BIOINFORMATICS, 2020, 21 (01)
  • [46] ChloroSeq, an Optimized Chloroplast RNA-Seq Bioinformatic Pipeline, Reveals Remodeling of the Organellar Transcriptome Under Heat Stress
    Castandet, Benoit
    Hotto, Amber M.
    Strickler, Susan R.
    Stern, David B.
    G3-GENES GENOMES GENETICS, 2016, 6 (09): : 2817 - 2827
  • [47] Evaluation of Oxford Nanopore MinION RNA-Seq Performance for Human Primary Cells
    Massaiu, Ilaria
    Songia, Paola
    Chiesa, Mattia
    Valerio, Vincenza
    Moschetta, Donato
    Alfieri, Valentina
    Myasoedova, Veronika A.
    Schmid, Michael
    Cassetta, Luca
    Colombo, Gualtiero, I
    D'Alessandra, Yuri
    Poggio, Paolo
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2021, 22 (12)
  • [48] CADBURE: A generic tool to evaluate the performance of spliced aligners on RNA-Seq data
    Kumar, Praveen Kumar Raj
    Hoang, Thanh V.
    Robinson, Michael L.
    Tsonis, Panagiotis A.
    Liang, Chun
    SCIENTIFIC REPORTS, 2015, 5
  • [49] Single Cell Explorer, collaboration-driven tools to leverage large-scale single cell RNA-seq data
    Feng, Di
    Whitehurst, Charles E.
    Shan, Dechao
    Hill, Jon A.
    Yue, Yong G.
    BMC GENOMICS, 2019, 20 (01)
  • [50] Single Cell Explorer, collaboration-driven tools to leverage large-scale single cell RNA-seq data
    Di Feng
    Charles E. Whitehurst
    Dechao Shan
    Jon D. Hill
    Yong G. Yue
    BMC Genomics, 20