A comprehensive workflow for optimizing RNA-seq data analysis

被引:7
|
作者
Jiang, Gao [1 ]
Zheng, Juan-Yu [1 ]
Ren, Shu-Ning [2 ]
Yin, Weilun [2 ]
Xia, Xinli [2 ]
Li, Yun [1 ]
Wang, Hou-Ling [2 ]
机构
[1] Beijing Forestry Univ, Sch Artificial Intelligence, Sch Informat Sci & Technol, Beijing 100083, Peoples R China
[2] Beijing Forestry Univ, Coll Biol Sci & Technol, Natl Engn Res Ctr Tree Breeding & Ecol Restorat, State Key Lab Tree Genet & Breeding, Beijing 100083, Peoples R China
来源
BMC GENOMICS | 2024年 / 25卷 / 01期
基金
北京市自然科学基金; 中国国家自然科学基金;
关键词
RNA-seq data; Differential gene analysis; Software comparison; DIFFERENTIAL EXPRESSION; ALIGNMENT; PROGRAM; HISAT;
D O I
10.1186/s12864-024-10414-y
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background Current RNA-seq analysis software for RNA-seq data tends to use similar parameters across different species without considering species-specific differences. However, the suitability and accuracy of these tools may vary when analyzing data from different species, such as humans, animals, plants, fungi, and bacteria. For most laboratory researchers lacking a background in information science, determining how to construct an analysis workflow that meets their specific needs from the array of complex analytical tools available poses a significant challenge.Results By utilizing RNA-seq data from plants, animals, and fungi, it was observed that different analytical tools demonstrate some variations in performance when applied to different species. A comprehensive experiment was conducted specifically for analyzing plant pathogenic fungal data, focusing on differential gene analysis as the ultimate goal. In this study, 288 pipelines using different tools were applied to analyze five fungal RNA-seq datasets, and the performance of their results was evaluated based on simulation. This led to the establishment of a relatively universal and superior fungal RNA-seq analysis pipeline that can serve as a reference, and certain standards for selecting analysis tools were derived for reference. Additionally, we compared various tools for alternative splicing analysis. The results based on simulated data indicated that rMATS remained the optimal choice, although consideration could be given to supplementing with tools such as SpliceWiz.Conclusion The experimental results demonstrate that, in comparison to the default software parameter configurations, the analysis combination results after tuning can provide more accurate biological insights. It is beneficial to carefully select suitable analysis software based on the data, rather than indiscriminately choosing tools, in order to achieve high-quality analysis results more efficiently.
引用
收藏
页数:21
相关论文
共 50 条
  • [41] Comprehensive RNA-seq transcriptomic profiling in the malignant progression of gliomas
    Zhao, Zheng
    Meng, Fanlin
    Wang, Wen
    Wang, Zheng
    Zhang, Chuanbao
    Jiang, Tao
    SCIENTIFIC DATA, 2017, 4
  • [42] ARH-seq: identification of differential splicing in RNA-seq data
    Rasche, Axel
    Lienhard, Matthias
    Yaspo, Marie-Laure
    Lehrach, Hans
    Herwig, Ralf
    NUCLEIC ACIDS RESEARCH, 2014, 42 (14) : e110
  • [43] Simulation-based comprehensive benchmarking of RNA-seq aligners
    Baruzzo, Giacomo
    Hayer, Katharina E.
    Kim, Eun Ji
    Di Camillo, Barbara
    FitzGerald, Garret A.
    Grant, Gregory R.
    NATURE METHODS, 2017, 14 (02) : 135 - 139
  • [44] Comprehensive assembly of novel transcripts from unmapped human RNA-Seq data and their association with cancer
    Kazemian, Majid
    Ren, Min
    Lin, Jian-Xin
    Liao, Wei
    Spolski, Rosanne
    Leonard, Warren J.
    MOLECULAR SYSTEMS BIOLOGY, 2015, 11 (08)
  • [45] Comprehensive Network Analysis of Lung Cancer Biomarkers Identifying Key Genes Through RNA-Seq Data and PPI Networks
    Alruily, Meshrif
    Elbashir, Murtada K.
    Ezz, Mohamed
    Aldughayfiq, Bader
    Alrowaily, Majed Abdullah
    Allahem, Hisham
    Mohammed, Mohanad
    Mostafa, Elsayed
    Mostafa, Ayman Mohamed
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2025, 2025 (01)
  • [46] GeneSelectML: a comprehensive way of gene selection for RNA-Seq data via machine learning algorithms
    Dag, Osman
    Kasikci, Merve
    Ilk, Ozlem
    Yesiltepe, Metin
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2023, 61 (01) : 229 - 241
  • [47] GeneSelectML: a comprehensive way of gene selection for RNA-Seq data via machine learning algorithms
    Osman Dag
    Merve Kasikci
    Ozlem Ilk
    Metin Yesiltepe
    Medical & Biological Engineering & Computing, 2023, 61 : 229 - 241
  • [48] An RNA-Seq Bioinformatics Pipeline for Data Processing of Arabidopsis Thaliana Datasets
    Deshpande, Sumukh
    James, Anne
    Franklin, Chris H.
    Leach, Lindsey J.
    Taramonli, Sandy
    Yang, Jianhua
    PROCEEDINGS OF 2017 INTERNATIONAL CONFERENCE ON BIOINFORMATICS RESEARCH AND APPLICATIONS (ICBRA 2017), 2015, : 1 - 8
  • [49] Grape RNA-Seq analysis pipeline environment
    Knowles, David G.
    Roeder, Maik
    Merkel, Angelika
    Guigo, Roderic
    BIOINFORMATICS, 2013, 29 (05) : 614 - 621
  • [50] A novel feature selection for RNA-seq analysis
    Han, Henry
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2017, 71 : 245 - 257