SimBA: A methodology and tools for evaluating the performance of RNA-Seq bioinformatic pipelines

被引:6
|
作者
Audoux, Jerome [1 ,2 ]
Salson, Mikael [3 ]
Grosset, Christophe F. [4 ]
Beaumeunier, Sacha [1 ,2 ]
Holder, Jean-Marc [1 ,2 ]
Commes, Therese [1 ,2 ]
Philippe, Nicolas [1 ,2 ]
机构
[1] CHR Montpellier, Hop St Eloi, IRMB, SeqOne, 80 Ave Augustin Fliche, F-34295 Montpellier, France
[2] Inst Computat Biol, 860 Rue St Priest, F-34095 Montpellier 5, France
[3] Univ Lille, CNRS, CRIStAL Ctr Rech Informat Signal & Automat Lille, INRIA,Cent Lille,UMR 9189, F-59000 Lille, France
[4] Univ Bordeaux, INSERM, BMGIC, U1035, F-33076 Bordeaux, France
来源
BMC BIOINFORMATICS | 2017年 / 18卷
关键词
RNA-Seq; Transcriptomics; Benchmark; Pipeline optimization; FRAMEWORK; ALIGNMENT; DISCOVERY; BENCHMARK;
D O I
10.1186/s12859-017-1831-5
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The evolution of next-generation sequencing (NGS) technologies has led to increased focus on RNA-Seq. Many bioinformatic tools have been developed for RNA-Seq analysis, each with unique performance characteristics and configuration parameters. Users face an increasingly complex task in understanding which bioinformatic tools are best for their specific needs and how they should be configured. In order to provide some answers to these questions, we investigate the performance of leading bioinformatic tools designed for RNA-Seq analysis and propose a methodology for systematic evaluation and comparison of performance to help users make well informed choices. Results: To evaluate RNA-Seq pipelines, we developed a suite of two benchmarking tools. SimCT generates simulated datasets that get as close as possible to specific real biological conditions accompanied by the list of genomic incidents and mutations that have been inserted. BenchCT then compares the output of any bioinformatics pipeline that has been run against a SimCT dataset with the simulated genomic and transcriptional variations it contains to give an accurate performance evaluation in addressing specific biological question. We used these tools to simulate a real-world genomic medicine question s involving the comparison of healthy and cancerous cells. Results revealed that performance in addressing a particular biological context varied significantly depending on the choice of tools and settings used. We also found that by combining the output of certain pipelines, substantial performance improvements could be achieved. Conclusion: Our research emphasizes the importance of selecting and configuring bioinformatic tools for the specific biological question being investigated to obtain optimal results. Pipeline designers, developers and users should include benchmarking in the context of their biological question as part of their design and quality control process. Our SimBA suite of benchmarking tools provides a reliable basis for comparing the performance of RNA-Seq bioinformatics pipelines in addressing a specific biological question. We would like to see the creation of a reference corpus of data-sets that would allow accurate comparison between benchmarks performed by different groups and the publication of more benchmarks based on this public corpus. SimBA software and data-set are available at http://cractools.gforge.inria.fr/softwares/simba/.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Application of a bioinformatic pipeline to RNA-seq data identifies novel virus-like sequence in human blood
    Melnick, Marko
    Gonzales, Patrick
    Larocca, Thomas J.
    Song, Yuping
    Wuu, Joanne
    Benatar, Michael
    Oskarsson, Bjorn
    Petrucelli, Leonard
    Dowell, Robin D.
    Link, Christopher D.
    Prudencio, Mercedes
    G3-GENES GENOMES GENETICS, 2021, 11 (09):
  • [22] Systematic evaluation of RNA-Seq preparation protocol performance
    Hsueh-Ping Chao
    Yueping Chen
    Yoko Takata
    Mary W. Tomida
    Kevin Lin
    Jason S. Kirk
    Melissa S. Simper
    Carol D. Mikulec
    Joyce E. Rundhaug
    Susan M. Fischer
    Taiping Chen
    Dean G. Tang
    Yue Lu
    Jianjun Shen
    BMC Genomics, 20
  • [23] Systematic evaluation of RNA-Seq preparation protocol performance
    Chao, Hsueh-Ping
    Chen, Yueping
    Takata, Yoko
    Tomida, Mary W.
    Lin, Kevin
    Kirk, Jason S.
    Simper, Melissa S.
    Mikulec, Carol D.
    Rundhaug, Joyce E.
    Fischer, Susan M.
    Chen, Taiping
    Tang, Dean G.
    Lu, Yue
    Shen, Jianjun
    BMC GENOMICS, 2019, 20 (1)
  • [24] Limitations of alignment-free tools in total RNA-seq quantification
    Douglas C. Wu
    Jun Yao
    Kevin S. Ho
    Alan M. Lambowitz
    Claus O. Wilke
    BMC Genomics, 19
  • [25] Limitations of alignment-free tools in total RNA-seq quantification
    Wu, Douglas C.
    Yao, Jun
    Ho, Kevin S.
    Lambowitz, Alanm.
    Wilke, Claus O.
    BMC GENOMICS, 2018, 19
  • [26] A survey of software tools for microRNA discovery and characterization using RNA-seq
    Bortolomeazzi, Michele
    Gaffo, Enrico
    Bortoluzzi, Stefania
    BRIEFINGS IN BIOINFORMATICS, 2019, 20 (03) : 918 - 930
  • [27] Unipro UGENE NGS pipelines and components for variant calling, RNA-seq and ChIP-seq data analyses
    Golosova, Olga
    Henderson, Ross
    Vaskin, Yuriy
    Gabrielian, Andrei
    Grekhov, German
    Nagarajan, Vijayaraj
    Oler, Andrew J.
    Nones, Mariam Qui
    Hurt, Darrell
    Fursov, Mikhail
    Huyen, Yentram
    PEERJ, 2014, 2
  • [28] A comprehensive benchmarking of differential splicing tools for RNA-seq analysis at the event level
    Jiang, Minghao
    Zhang, Shiyan
    Yin, Hongxin
    Zhuo, Zhiyi
    Meng, Guoyu
    BRIEFINGS IN BIOINFORMATICS, 2023, 24 (03)
  • [29] FastqPuri: high-performance preprocessing of RNA-seq data
    Paula Pérez-Rubio
    Claudio Lottaz
    Julia C. Engelmann
    BMC Bioinformatics, 20
  • [30] Comparison of Alternative Splicing Junction Detection Tools Using RNA-Seq Data
    Ding, Lizhong
    Rath, Ethan
    Bai, Yongsheng
    CURRENT GENOMICS, 2017, 18 (03) : 268 - 277