RNA-seq data science: From raw data to effective interpretation

被引:57
作者
Deshpande, Dhrithi [1 ]
Chhugani, Karishma [1 ]
Chang, Yutong [1 ]
Karlsberg, Aaron [2 ]
Loeffler, Caitlin [3 ]
Zhang, Jinyang [4 ]
Muszynska, Agata [5 ,6 ]
Munteanu, Viorel [7 ]
Yang, Harry [8 ]
Rotman, Jeremy [2 ]
Tao, Laura [9 ]
Balliu, Brunilda [9 ]
Tseng, Elizabeth [10 ]
Eskin, Eleazar [3 ,9 ,11 ]
Zhao, Fangqing [4 ,12 ]
Mohammadi, Pejman [13 ]
Labaj, Pawel P. [5 ,14 ]
Mangul, Serghei [2 ,15 ]
机构
[1] USC Alfred E Mann Sch Pharm & Pharmaceut Sci, Dept Pharmacol & Pharmaceut Sci, Los Angeles, CA USA
[2] USC Alfred E Mann Sch Pharm & Pharmaceut Sci, Dept Clin Pharm, Los Angeles, CA 90089 USA
[3] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA USA
[4] Chinese Acad Sci, Beijing Inst Life Sci, Beijing, Peoples R China
[5] Jagiellonian Univ, Malopolska Ctr Biotechnol, Krakow, Poland
[6] Silesian Tech Univ, Inst Automat Control Elect & Comp Sci, Gliwice, Poland
[7] Tech Univ Moldova, Dept Comp Informat & Microelect, Kishinev, Moldova
[8] Univ Calif Los Angeles, Dept Microbiol Immunol & Mol Genet, Los Angeles, CA USA
[9] UCLA, Dept Computat Med, CHS, David Geffen Sch Med, Los Angeles, CA USA
[10] Pacific Biosci, Menlo Pk, CA USA
[11] UCLA, Dept Human Genet, David Geffen Sch Med, Los Angeles, CA USA
[12] Univ Chinese Acad Sci, Hangzhou Inst Adv Study, Key Lab Syst Biol, Hangzhou, Peoples R China
[13] Scripps Res Inst, Dept Integrat Struct & Computat Biol, La Jolla, CA USA
[14] Boku Univ Vienna, Dept Biotechnol, Vienna, Austria
[15] USC Dornsife Coll Letters Arts & Sci, Dept Quantitat & Computat Biol, Los Angeles, CA 90089 USA
基金
中国国家自然科学基金; 美国国家科学基金会;
关键词
RNA sequencing; transcriptome quantification; differential gene expression; high throughput sequencing; read alignment; bioinformatics; CIRCULAR RNAS; EXPRESSION ANALYSIS; DIFFERENTIAL GENE; SEQUENCING DATA; QUANTIFICATION; ALIGNMENT; READS; LANDSCAPE; FRAMEWORK; TRANSCRIPTOME;
D O I
10.3389/fgene.2023.997383
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
RNA sequencing (RNA-seq) has become an exemplary technology in modern biology and clinical science. Its immense popularity is due in large part to the continuous efforts of the bioinformatics community to develop accurate and scalable computational tools to analyze the enormous amounts of transcriptomic data that it produces. RNA-seq analysis enables genes and their corresponding transcripts to be probed for a variety of purposes, such as detecting novel exons or whole transcripts, assessing expression of genes and alternative transcripts, and studying alternative splicing structure. It can be a challenge, however, to obtain meaningful biological signals from raw RNA-seq data because of the enormous scale of the data as well as the inherent limitations of different sequencing technologies, such as amplification bias or biases of library preparation. The need to overcome these technical challenges has pushed the rapid development of novel computational tools, which have evolved and diversified in accordance with technological advancements, leading to the current myriad of RNA-seq tools. These tools, combined with the diverse computational skill sets of biomedical researchers, help to unlock the full potential of RNA-seq. The purpose of this review is to explain basic concepts in the computational analysis of RNA-seq data and define discipline-specific jargon.
引用
收藏
页数:12
相关论文
共 131 条
[1]   Pegasus: a comprehensive annotation and prediction tool for detection of driver gene fusions in cancer [J].
Abate, Francesco ;
Zairis, Sakellarios ;
Ficarra, Elisa ;
Acquaviva, Andrea ;
Wiggins, Chris H. ;
Frattini, Veronique ;
Lasorella, Anna ;
Iavarone, Antonio ;
Inghirami, Giorgio ;
Rabadan, Raul .
BMC SYSTEMS BIOLOGY, 2014, 8
[2]   Technology dictates algorithms: recent developments in read alignment [J].
Alser, Mohammed ;
Rotman, Jeremy ;
Deshpande, Dhrithi ;
Taraszka, Kodi ;
Shi, Huwenbo ;
Baykal, Pelin Icer ;
Yang, Harry Taegyun ;
Xue, Victor ;
Knyazev, Sergey ;
Singer, Benjamin D. ;
Balliu, Brunilda ;
Koslicki, David ;
Skums, Pavel ;
Zelikovsky, Alex ;
Alkan, Can ;
Mutlu, Onur ;
Mangul, Serghei .
GENOME BIOLOGY, 2021, 22 (01)
[3]   Opportunities and challenges in long-read sequencing data analysis [J].
Amarasinghe, Shanika L. ;
Su, Shian ;
Dong, Xueyi ;
Zappia, Luke ;
Ritchie, Matthew E. ;
Gouil, Quentin .
GENOME BIOLOGY, 2020, 21 (01)
[4]   HTSeq-a Python']Python framework to work with high-throughput sequencing data [J].
Anders, Simon ;
Pyl, Paul Theodor ;
Huber, Wolfgang .
BIOINFORMATICS, 2015, 31 (02) :166-169
[5]   Combining short and long read sequencing to characterize antimicrobial resistance genes on plasmids applied to an unauthorized genetically modified Bacillus [J].
Berbers, Bas ;
Saltykova, Assia ;
Garcia-Graells, Cristina ;
Philipp, Patrick ;
Arella, Fabrice ;
Marchal, Kathleen ;
Winand, Raf ;
Vanneste, Kevin ;
Roosens, Nancy H. C. ;
De Keersmaecker, Sigrid C. J. .
SCIENTIFIC REPORTS, 2020, 10 (01)
[6]  
Bharagava RN, 2019, MICROBIAL DIVERSITY IN THE GENOMIC ERA, P459, DOI 10.1016/B978-0-12-814849-5.00026-5
[7]   Trimmomatic: a flexible trimmer for Illumina sequence data [J].
Bolger, Anthony M. ;
Lohse, Marc ;
Usadel, Bjoern .
BIOINFORMATICS, 2014, 30 (15) :2114-2120
[8]   MiXCR: software for comprehensive adaptive immunity profiling [J].
Bolotin, Dmitriy A. ;
Poslavsky, Stanislav ;
Mitrophanov, Igor ;
Shugay, Mikhail ;
Mamedov, Ilgar Z. ;
Putintseva, Ekaterina V. ;
Chudakov, Dmitriy M. .
NATURE METHODS, 2015, 12 (05) :380-381
[9]   Near-optimal probabilistic RNA-seq quantification [J].
Bray, Nicolas L. ;
Pimentel, Harold ;
Melsted, Pall ;
Pachter, Lior .
NATURE BIOTECHNOLOGY, 2016, 34 (05) :525-527
[10]  
Brown TA., 2002, Understanding a Genome Sequence