Reusable tutorials for using cloud-based computing environments for the analysis of bacterial gene expression data from bulk RNA sequencing

被引:1
作者
Allers, Steven [1 ]
O'Connell, Kyle A. [2 ,3 ]
Carlson, Thad [2 ,3 ]
Belardo, David [4 ]
King, Benjamin L. [1 ,5 ,6 ]
机构
[1] Univ Maine, Dept Mol & Biomed Sci, 5735 Hitchner Hall, Orono, ME 04469 USA
[2] NIH, Ctr Informat Technol, 6555 Rock Spring Dr, Bethesda, MD 20817 USA
[3] Deloitte Consulting LLP, Hlth Data & AI, 1919 N Lynn St, Arlington, VA 22203 USA
[4] Google, Google Cloud, 1900 Reston Metro Plaza, Reston, VA 20190 USA
[5] MDI Biol Lab, Maine Inst Dev Award Network Biomed Res Excellence, 159 Old Bar Harbor Rd, Bar Harbor, ME 04609 USA
[6] Univ Maine, Grad Sch Biomed Sci & Engn, 5775 Stodder Hall, Orono, ME 04469 USA
基金
美国国家卫生研究院;
关键词
gene expression; RNA sequencing; microbial genomics; analysis workflow; cloud computing; training; HISAT;
D O I
10.1093/bib/bbae301
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
This manuscript describes the development of a resource module that is part of a learning platform named "NIGMS Sandbox for Cloud-based Learning" https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox at the beginning of this Supplement. This module delivers learning materials on RNA sequencing (RNAseq) data analysis in an interactive format that uses appropriate cloud resources for data access and analyses. Biomedical research is increasingly data-driven, and dependent upon data management and analysis methods that facilitate rigorous, robust, and reproducible research. Cloud-based computing resources provide opportunities to broaden the application of bioinformatics and data science in research. Two obstacles for researchers, particularly those at small institutions, are: (i) access to bioinformatics analysis environments tailored to their research; and (ii) training in how to use Cloud-based computing resources. We developed five reusable tutorials for bulk RNAseq data analysis to address these obstacles. Using Jupyter notebooks run on the Google Cloud Platform, the tutorials guide the user through a workflow featuring an RNAseq dataset from a study of prophage altered drug resistance in Mycobacterium chelonae. The first tutorial uses a subset of the data so users can learn analysis steps rapidly, and the second uses the entire dataset. Next, a tutorial demonstrates how to analyze the read count data to generate lists of differentially expressed genes using R/DESeq2. Additional tutorials generate read counts using the Snakemake workflow manager and Nextflow with Google Batch. All tutorials are open-source and can be used as templates for other analysis.
引用
收藏
页数:12
相关论文
共 57 条
  • [1] HTSeq-a Python']Python framework to work with high-throughput sequencing data
    Anders, Simon
    Pyl, Paul Theodor
    Huber, Wolfgang
    [J]. BIOINFORMATICS, 2015, 31 (02) : 166 - 169
  • [2] Andrews S., 2010, Babraham Bioinformatics
  • [3] [Anonymous], 2022, NIH Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative
  • [4] [Anonymous], 2022, sra-tools: The SRA Toolkit and SDK collection of tools and libraries for using data in the INSDC Sequence Read Archives
  • [5] [Anonymous], 2024, libarchive: Multi-format archive and compression library
  • [6] Arnold K., 2006, The Java Programming Language, Vxxviii, P891
  • [7] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [8] Blighe K., 2022, EnhancedVolcano: PublicationReady Volcano Plots With Enhanced Colouring and Labeling
  • [9] Trimmomatic: a flexible trimmer for Illumina sequence data
    Bolger, Anthony M.
    Lohse, Marc
    Usadel, Bjoern
    [J]. BIOINFORMATICS, 2014, 30 (15) : 2114 - 2120
  • [10] Increased whiB7 expression and antibiotic resistance in Mycobacterium chelonae carrying two prophages
    Cushman, Jaycee
    Freeman, Emma
    McCallister, Sarah
    Schumann, Anna
    Hutchison, Keith W.
    Molloy, Sally D.
    [J]. BMC MICROBIOLOGY, 2021, 21 (01)