Read Mapping and Transcript Assembly: A Scalable and High-Throughput Workflow for the Processing and Analysis of Ribonucleic Acid Sequencing Data

被引:15
作者
Peri, Sateesh [1 ]
Roberts, Sarah [2 ]
Kreko, Isabella R. [3 ]
McHan, Lauren B. [3 ]
Naron, Alexandra [3 ]
Ram, Archana [3 ]
Murphy, Rebecca L. [4 ]
Lyons, Eric [1 ,2 ]
Gregory, Brian D. [5 ]
Devisetty, Upendra K. [2 ]
Nelson, Andrew D. L. [6 ]
机构
[1] Univ Arizona, Genet Grad Interdisciplinary Grp, Tucson, AZ USA
[2] Univ Arizona, CyVerse, Tucson, AZ USA
[3] Univ Arizona, Sch Plant Sci, LIVE For Plants Summer Res Program, Tucson, AZ USA
[4] Centenary Coll Louisiana, Biol Dept, Shreveport, LA USA
[5] Univ Penn, Dept Biol, Philadelphia, PA 19104 USA
[6] Cornell Univ, Boyce Thompson Inst, Ithaca, NY 14850 USA
基金
美国国家科学基金会;
关键词
RNA-seq; transcriptomics; high throughput (-omics) techniques; bioinformatics; workflow; EXPRESSION ANALYSIS; ARABIDOPSIS; COGE;
D O I
10.3389/fgene.2019.01361
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Next-generation RNA-sequencing is an incredibly powerful means of generating a snapshot of the transcriptomic state within a cell, tissue, or whole organism. As the questions addressed by RNA-sequencing (RNA-seq) become both more complex and greater in number, there is a need to simplify RNA-seq processing workflows, make them more efficient and interoperable, and capable of handling both large and small datasets. This is especially important for researchers who need to process hundreds to tens of thousands of RNA-seq datasets. To address these needs, we have developed a scalable, user-friendly, and easily deployable analysis suite called RMTA (Read Mapping, Transcript Assembly). RMTA can easily process thousands of RNA-seq datasets with features that include automated read quality analysis, filters for lowly expressed transcripts, and read counting for differential expression analysis. RMTA is containerized using Docker for easy deployment within any compute environment [cloud, local, or high-performance computing (HPC)] and is available as two apps in CyVerse's Discovery Environment, one for normal use and one specifically designed for introducing undergraduates and high school to RNA-seq analysis. For extremely large datasets (tens of thousands of FASTq files) we developed a high-throughput, scalable, and parallelized version of RMTA optimized for launching on the Open Science Grid (OSG) from within the Discovery Environment. OSG-RMTA allows users to utilize the Discovery Environment for data management, parallelization, and submitting jobs to OSG, and finally, employ the OSG for distributed, high throughput computing. Alternatively, OSG-RMTA can be run directly on the OSG through the command line. RMTA is designed to be useful for data scientists, of any skill level, interested in rapidly and reproducibly analyzing their large RNA-seq data sets.
引用
收藏
页数:9
相关论文
共 35 条
  • [1] The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update
    Afgan, Enis
    Baker, Dannon
    van den Beek, Marius
    Blankenberg, Daniel
    Bouvier, Dave
    Cech, Martin
    Chilton, John
    Clements, Dave
    Coraor, Nate
    Eberhard, Carl
    Gruening, Bjoern
    Guerler, Aysam
    Hillman-Jackson, Jennifer
    Von Kuster, Greg
    Rasche, Eric
    Soranzo, Nicola
    Turaga, Nitesh
    Taylor, James
    Nekrutenko, Anton
    Goecks, Jeremy
    [J]. NUCLEIC ACIDS RESEARCH, 2016, 44 (W1) : W3 - W10
  • [2] N6-Methyladenosine Inhibits Local Ribonucleolytic Cleavage to Stabilize mRNAs in Arabidopsis
    Anderson, Stephen J.
    Kramer, Marianne C.
    Gosai, Sager J.
    Yu, Xiang
    Vandivier, Lee E.
    Nelson, Andrew D. L.
    Anderson, Zachary D.
    Beilstein, Mark A.
    Fray, Rupert G.
    Lyons, Eric
    Gregory, Brian D.
    [J]. CELL REPORTS, 2018, 25 (05): : 1146 - +
  • [3] Birger C., 2017, (Bioinformatics), DOI DOI 10.1101/209494
  • [4] STAR: ultrafast universal RNA-seq aligner
    Dobin, Alexander
    Davis, Carrie A.
    Schlesinger, Felix
    Drenkow, Jorg
    Zaleski, Chris
    Jha, Sonali
    Batut, Philippe
    Chaisson, Mark
    Gingeras, Thomas R.
    [J]. BIOINFORMATICS, 2013, 29 (01) : 15 - 21
  • [5] Folarin AA, 2015, F1000RES, DOI [DOI 10.12688/F1000RESEARCH.7104.1, 10.12688/f1000research.7104.1]
  • [6] Deciphering genetic factors that determine melon fruit-quality traits using RNA-Seq-based high-resolution QTL and eQTL mapping
    Galpaz, Navot
    Gonda, Itay
    Shem-Tov, Doron
    Barad, Omer
    Tzuri, Galil
    Lev, Shery
    Fei, Zhangjun
    Xu, Yimin
    Mao, Linyong
    Jiao, Chen
    Harel-Beja, Rotem
    Doron-Faigenboim, Adi
    Tzfadia, Oren
    Bar, Einat
    Meir, Ayala
    Sa'ar, Uzi
    Fait, Aaron
    Halperin, Eran
    Kenigswald, Merav
    Fallik, Elazar
    Lombardi, Nadia
    Kol, Guy
    Ronen, Gil
    Burger, Yosef
    Gur, Amit
    Tadmor, Ya'akov
    Portnoy, Vitaly
    Schaffer, Arthur A.
    Lewinsohn, Efraim
    Giovannoni, James J.
    Katzir, Nurit
    [J]. PLANT JOURNAL, 2018, 94 (01) : 169 - 191
  • [7] Global Analysis of the RNA-Protein Interaction and RNA Secondary Structure Landscapes of the Arabidopsis Nucleus
    Gosai, Sager J.
    Foley, Shawn W.
    Wang, Dongxue
    Silverman, Ian M.
    Selamoglu, Nur
    Nelson, Andrew D. L.
    Beilstein, Mark A.
    Daldal, Fevzi
    Deal, Roger B.
    Gregory, Brian D.
    [J]. MOLECULAR CELL, 2015, 57 (02) : 376 - 388
  • [8] CoGe LoadExp+: A web-based suite that integrates next-generation sequencing data analysis workflows and visualization
    Grover, Jeffrey W.
    Bomhoff, Matthew
    Davey, Sean
    Gregory, Brian D.
    Mosher, Rebecca A.
    Lyons, Eric
    [J]. PLANT DIRECT, 2017, 1 (02)
  • [9] A transcriptome-wide association study of high-grade serous epithelial ovarian cancer identifies new susceptibility genes and splice variants
    Gusev, Alexander
    Lawrenson, Kate
    Lin, Xianzhi
    Lyra, Paulo C., Jr.
    Kar, Siddhartha
    Vavra, Kevin C.
    Segato, Felipe
    Fonseca, Marcos A. S.
    Lee, Janet M.
    Pejovic, Tanya
    Liu, Gang
    Karlan, Beth Y.
    Freedman, Matthew L.
    Noushmehr, Houtan
    Monteiro, Alvaro N.
    Pharoah, Paul D. P.
    Pasaniuc, Bogdan
    Gayther, Simon A.
    [J]. NATURE GENETICS, 2019, 51 (05) : 815 - +
  • [10] Jensen T. L., 2018, F1000Res, V6, P2162, DOI [10.12688/f1000research.13049.2, DOI 10.12688/F1000RESEARCH.13049.2]