Ultrafast functional profiling of RNA-seq data for nonmodel organisms

被引:16
|
作者
Liu, Peng [1 ]
Ewald, Jessica [1 ]
Galvez, Jose Hector [2 ,3 ]
Head, Jessica [1 ]
Crump, Doug [4 ]
Bourque, Guillaume [2 ,3 ]
Basu, Niladri [1 ]
Xia, Jianguo [1 ,2 ]
机构
[1] McGill Univ, Fac Agr & Environm Sci, Montreal, PQ H9X 3V9, Canada
[2] McGill Univ, Dept Human Genet, Montreal, PQ H3A 0C7, Canada
[3] McGill Univ, Canadian Ctr Computat Genom, Montreal, PQ H3A 0G1, Canada
[4] Natl Wildlife Res Ctr, Environm & Climate Change Canada, Ottawa, ON K1A 0H3, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
10.1101/gr.269894.120
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Computational time and cost remain a major bottleneck for RNA-seq data analysis of nonmodel organisms without reference genomes. To address this challenge, we have developed Seq2Fun, a novel, all-in-one, ultrafast tool to directly perform functional quantification of RNA-seq reads without transcriptome de novo assembly. The pipeline starts with raw read quality control: sequencing error correction, removing poly(A) tails, and joining overlapped paired-end reads. It then conducts a DNA-to-protein search by translating each read into all possible amino acid fragments and subsequently identifies possible homologous sequences in a well-curated protein database. Finally, the pipeline generates several informative outputs including gene abundance tables, pathway and species hit tables, an HTML report to visualize the results, and an output of clean reads annotated with mapped genes ready for downstream analysis. Seq2Fun does not have any intermediate steps of file writing and loading, making 1/O very efficient. Seq2Fun is written in C++ and can run on a personal computer with a limited number of CPUs and memory. It can process >2,000,000 reads/min and is >120 times faster than conventional workflows based on de novo assembly, while maintaining high accuracy in our various test data sets.
引用
收藏
页码:713 / 720
页数:8
相关论文
共 50 条
  • [1] A Computational Framework for Identifying Promoter Sequences in Nonmodel Organisms Using RNA-seq Data Sets
    Wilson, Erin H.
    Groom, Joseph D.
    Sarfatis, M. Claire
    Ford, Stephanie M.
    Lidstrom, Mary E.
    Beck, David A. C.
    ACS SYNTHETIC BIOLOGY, 2021, 10 (06): : 1394 - 1405
  • [2] Antigen receptor repertoire profiling from RNA-seq data
    Dmitriy A Bolotin
    Stanislav Poslavsky
    Alexey N Davydov
    Felix E Frenkel
    Lorenzo Fanchi
    Olga I Zolotareva
    Saskia Hemmers
    Ekaterina V Putintseva
    Anna S Obraztsova
    Mikhail Shugay
    Ravshan I Ataullakhanov
    Alexander Y Rudensky
    Ton N Schumacher
    Dmitriy M Chudakov
    Nature Biotechnology, 2017, 35 : 908 - 911
  • [3] Antigen receptor repertoire profiling from RNA-seq data
    Bolotin, Dmitriy A.
    Poslavsky, Stanislav
    Davydov, Alexey N.
    Frenkel, Felix E.
    Fanchi, Lorenzo
    Zolotareva, Olga I.
    Hemmers, Saskia
    Putintseva, Ekaterina V.
    Obraztsova, Anna S.
    Shugay, Mikhail
    Ataullakhanov, Ravshan I.
    Rudensky, Alexander Y.
    Schumacher, Ton N.
    Chudakov, Dmitriy M.
    NATURE BIOTECHNOLOGY, 2017, 35 (10) : 908 - +
  • [4] STAR: ultrafast universal RNA-seq aligner
    Dobin, Alexander
    Davis, Carrie A.
    Schlesinger, Felix
    Drenkow, Jorg
    Zaleski, Chris
    Jha, Sonali
    Batut, Philippe
    Chaisson, Mark
    Gingeras, Thomas R.
    BIOINFORMATICS, 2013, 29 (01) : 15 - 21
  • [5] Integration of RNA-Seq data with heterogeneous microarray data for breast cancer profiling
    Daniel Castillo
    Juan Manuel Gálvez
    Luis Javier Herrera
    Belén San Román
    Fernando Rojas
    Ignacio Rojas
    BMC Bioinformatics, 18
  • [6] Integration of RNA-Seq data with heterogeneous microarray data for breast cancer profiling
    Castillo, Daniel
    Manuel Galvez, Juan
    Javier Herrera, Luis
    San Roman, Belen
    Rojas, Fernando
    Rojas, Ignacio
    BMC BIOINFORMATICS, 2017, 18
  • [7] RNA-Seq profiling of bovine preimplantation embryos
    Krebs, S.
    Graf, A.
    Schmieder, S.
    Zakhartchenko, V.
    Wolf, E.
    Blum, H.
    REPRODUCTION IN DOMESTIC ANIMALS, 2012, 47 : 449 - 450
  • [8] Comparative profiling of the CHO transcriptome by RNA-Seq
    Jacob, Nitya M.
    Rao, Navneet
    Nissom, Peter M.
    Chin, Ju-Xin
    Yusufi, Faraaz
    Chuah, Song-Hui
    Retzel, Ernest F.
    Loo, Bernard L. W.
    Lee, Dong-Yup
    Karypis, George
    Yap, Miranda
    Hu, Wei-Shou
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2010, 239
  • [9] RNA-Seq Data Analysis: A Practical Guide for Model and Non-Model Organisms
    Pola-Sanchez, Enrique
    Hernandez-Martinez, Karen Magdalena
    Perez-Estrada, Rafael
    Selem-Mojica, Nelly
    Simpson, June
    Abraham-Juarez, Maria Jazmin
    Herrera-Estrella, Alfredo
    Villalobos-Escobedo, Jose Manuel
    CURRENT PROTOCOLS, 2024, 4 (05):
  • [10] Profiling Alternative 3′ Untranslated Regions in Sorghum using RNA-seq Data
    Tu, Min
    Li, Yin
    FRONTIERS IN GENETICS, 2020, 11