A comprehensive workflow for optimizing RNA-seq data analysis

被引:7
|
作者
Jiang, Gao [1 ]
Zheng, Juan-Yu [1 ]
Ren, Shu-Ning [2 ]
Yin, Weilun [2 ]
Xia, Xinli [2 ]
Li, Yun [1 ]
Wang, Hou-Ling [2 ]
机构
[1] Beijing Forestry Univ, Sch Artificial Intelligence, Sch Informat Sci & Technol, Beijing 100083, Peoples R China
[2] Beijing Forestry Univ, Coll Biol Sci & Technol, Natl Engn Res Ctr Tree Breeding & Ecol Restorat, State Key Lab Tree Genet & Breeding, Beijing 100083, Peoples R China
来源
BMC GENOMICS | 2024年 / 25卷 / 01期
基金
北京市自然科学基金; 中国国家自然科学基金;
关键词
RNA-seq data; Differential gene analysis; Software comparison; DIFFERENTIAL EXPRESSION; ALIGNMENT; PROGRAM; HISAT;
D O I
10.1186/s12864-024-10414-y
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background Current RNA-seq analysis software for RNA-seq data tends to use similar parameters across different species without considering species-specific differences. However, the suitability and accuracy of these tools may vary when analyzing data from different species, such as humans, animals, plants, fungi, and bacteria. For most laboratory researchers lacking a background in information science, determining how to construct an analysis workflow that meets their specific needs from the array of complex analytical tools available poses a significant challenge.Results By utilizing RNA-seq data from plants, animals, and fungi, it was observed that different analytical tools demonstrate some variations in performance when applied to different species. A comprehensive experiment was conducted specifically for analyzing plant pathogenic fungal data, focusing on differential gene analysis as the ultimate goal. In this study, 288 pipelines using different tools were applied to analyze five fungal RNA-seq datasets, and the performance of their results was evaluated based on simulation. This led to the establishment of a relatively universal and superior fungal RNA-seq analysis pipeline that can serve as a reference, and certain standards for selecting analysis tools were derived for reference. Additionally, we compared various tools for alternative splicing analysis. The results based on simulated data indicated that rMATS remained the optimal choice, although consideration could be given to supplementing with tools such as SpliceWiz.Conclusion The experimental results demonstrate that, in comparison to the default software parameter configurations, the analysis combination results after tuning can provide more accurate biological insights. It is beneficial to carefully select suitable analysis software based on the data, rather than indiscriminately choosing tools, in order to achieve high-quality analysis results more efficiently.
引用
收藏
页数:21
相关论文
共 50 条
  • [31] NetSeekR: a network analysis pipeline for RNA-Seq time series data
    Himangi Srivastava
    Drew Ferrell
    George V. Popescu
    BMC Bioinformatics, 23
  • [32] LFCseq: a nonparametric approach for differential expression analysis of RNA-seq data
    Bingqing Lin
    Li-Feng Zhang
    Xin Chen
    BMC Genomics, 15
  • [33] RNA-Seq Data Analysis using Nonparametric Gaussian Process Models
    Thanh Nguyen
    Nahavandi, Saeid
    Creighton, Douglas
    Khosravi, Abbas
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 5087 - 5093
  • [34] RNA-Seq: revelation of the messengers
    Van Verk, Marcel C.
    Hickman, Richard
    Pieterse, Corne M. J.
    Van Wees, Saskia C. M.
    TRENDS IN PLANT SCIENCE, 2013, 18 (04) : 175 - 179
  • [35] The bench scientist's guide to statistical analysis of RNA-Seq data
    Craig R Yendrek
    Elizabeth A Ainsworth
    Jyothi Thimmapuram
    BMC Research Notes, 5 (1)
  • [36] NetSeekR: a network analysis pipeline for RNA-Seq time series data
    Srivastava, Himangi
    Ferrell, Drew
    Popescu, George, V
    BMC BIOINFORMATICS, 2022, 23 (01)
  • [37] A Review on The Processing and Analysis of Next-generation RNA-seq Data
    Wang Xi
    Wang Xiao-Wo
    Wang Li-Kun
    Feng Zhi-Xing
    Zhang Xue-Gong
    PROGRESS IN BIOCHEMISTRY AND BIOPHYSICS, 2010, 37 (08) : 834 - 846
  • [38] LFCseq: a nonparametric approach for differential expression analysis of RNA-seq data
    Lin, Bingqing
    Zhang, Li-Feng
    Chen, Xin
    BMC GENOMICS, 2014, 15
  • [39] Comprehensive Analysis of Alternative Splicing in Digitalis purpurea by Strand-Specific RNA-Seq
    Wu, Bin
    Suo, Fengmei
    Lei, Wanjun
    Gu, Lianfeng
    PLOS ONE, 2014, 9 (08):
  • [40] Comprehensive analysis of circRNAs from cashmere goat skin by next generation RNA sequencing (RNA-seq)
    Zheng, Yuanyuan
    Hui, Taiyu
    Yue, Chang
    Sun, Jiaming
    Guo, Dan
    Guo, Suling
    Guo, Suping
    Li, Bojiang
    Wang, Zeying
    Bai, Wenlin
    SCIENTIFIC REPORTS, 2020, 10 (01)