Massive mining of publicly available RNA-seq data from human and mouse

被引:433
作者
Lachmann, Alexander [1 ,2 ,3 ,4 ]
Torre, Denis [1 ,2 ,3 ,4 ]
Keenan, Alexandra B. [1 ,2 ,3 ,4 ]
Jagodnik, Kathleen M. [1 ,2 ,3 ,4 ]
Lee, Hoyjin J. [1 ,2 ,3 ,4 ]
Wang, Lily [1 ,2 ,3 ,4 ]
Silverstein, Moshe C. [1 ,2 ,3 ,4 ]
Ma'ayan, Avi [1 ,2 ,3 ,4 ]
机构
[1] Icahn Sch Med Mt Sinai, Dept Pharmacol Sci, One Gustave L Levy PlaceBox 1603, New York, NY 10029 USA
[2] Icahn Sch Med Mt Sinai, Mt Sinai Ctr Bioinformat, One Gustave L Levy PlaceBox 1603, New York, NY 10029 USA
[3] Icahn Sch Med Mt Sinai, DCIC, Lib Integrated Network Based Cellular Signatures, Big Data Knowledge,BD2K,LINCS, One Gustave L Levy PlaceBox 1603, New York, NY 10029 USA
[4] Icahn Sch Med Mt Sinai, KMC, IDC, One Gustave L Levy PlaceBox 1603, New York, NY 10029 USA
基金
美国国家卫生研究院;
关键词
GENE-EXPRESSION; ALIGNMENT; TOOL; ULTRAFAST; ENCYCLOPEDIA; INTEGRATION; REPOSITORY; ONTOLOGY; ENRICHR;
D O I
10.1038/s41467-018-03751-6
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
RNA sequencing (RNA-seq) is the leading technology for genome-wide transcript quantification. However, publicly available RNA-seq data is currently provided mostly in raw form, a significant barrier for global and integrative retrospective analyses. ARCHS4 is a web resource that makes the majority of published RNA-seq data from human and mouse available at the gene and transcript levels. For developing ARCHS4, available FASTQ files from RNA-seq experiments from the Gene Expression Omnibus (GEO) were aligned using a cloud-based infrastructure. In total 187,946 samples are accessible through ARCHS4 with 103,083 mouse and 84,863 human. Additionally, the ARCHS4 web interface provides intuitive exploration of the processed data through querying tools, interactive visualization, and gene pages that provide average expression across cell lines and tissues, top co-expressed genes for each gene, and predicted biological functions and protein-protein interactions for each gene based on prior knowledge combined with co-expression.
引用
收藏
页数:10
相关论文
共 48 条
[1]  
[Anonymous], 2010, DATABASE, DOI DOI 10.1093/DATABASE/BAQ020
[2]  
[Anonymous], NUCL ACIDS RES
[3]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[4]   The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity [J].
Barretina, Jordi ;
Caponigro, Giordano ;
Stransky, Nicolas ;
Venkatesan, Kavitha ;
Margolin, Adam A. ;
Kim, Sungjoon ;
Wilson, Christopher J. ;
Lehar, Joseph ;
Kryukov, Gregory V. ;
Sonkin, Dmitriy ;
Reddy, Anupama ;
Liu, Manway ;
Murray, Lauren ;
Berger, Michael F. ;
Monahan, John E. ;
Morais, Paula ;
Meltzer, Jodi ;
Korejwa, Adam ;
Jane-Valbuena, Judit ;
Mapa, Felipa A. ;
Thibault, Joseph ;
Bric-Furlong, Eva ;
Raman, Pichai ;
Shipway, Aaron ;
Engels, Ingo H. ;
Cheng, Jill ;
Yu, Guoying K. ;
Yu, Jianjun ;
Aspesi, Peter, Jr. ;
de Silva, Melanie ;
Jagtap, Kalpana ;
Jones, Michael D. ;
Wang, Li ;
Hatton, Charles ;
Palescandolo, Emanuele ;
Gupta, Supriya ;
Mahan, Scott ;
Sougnez, Carrie ;
Onofrio, Robert C. ;
Liefeld, Ted ;
MacConaill, Laura ;
Winckler, Wendy ;
Reich, Michael ;
Li, Nanxin ;
Mesirov, Jill P. ;
Gabriel, Stacey B. ;
Getz, Gad ;
Ardlie, Kristin ;
Chan, Vivien ;
Myer, Vic E. .
NATURE, 2012, 483 (7391) :603-607
[5]   D3: Data-Driven Documents [J].
Bostock, Michael ;
Ogievetsky, Vadim ;
Heer, Jeffrey .
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2011, 17 (12) :2301-2309
[6]   Near-optimal probabilistic RNA-seq quantification [J].
Bray, Nicolas L. ;
Pimentel, Harold ;
Melsted, Pall ;
Pachter, Lior .
NATURE BIOTECHNOLOGY, 2016, 34 (05) :525-527
[7]   ArrayExpress - a public repository for microarray gene expression data at the EBI [J].
Brazma, A ;
Parkinson, H ;
Sarkans, U ;
Shojatalab, M ;
Vilo, J ;
Abeygunawardena, N ;
Holloway, E ;
Kapushesky, M ;
Kemmeren, P ;
Lara, GG ;
Oezcimen, A ;
Rocca-Serra, P ;
Sansone, SA .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :68-71
[8]   Enrichr: interactive and collaborative HTML']HTML5 gene list enrichment analysis tool [J].
Chen, Edward Y. ;
Tan, Christopher M. ;
Kou, Yan ;
Duan, Qiaonan ;
Wang, Zichen ;
Meirelles, Gabriela Vaz ;
Clark, Neil R. ;
Ma'ayan, Avi .
BMC BIOINFORMATICS, 2013, 14
[9]   Reproducible RNA-seq analysis using recount2 [J].
Collado-Torres, Leonardo ;
Nellore, Abhinav ;
Kammers, Kai ;
Ellis, Shannon E. ;
Taub, Margaret A. ;
Hansen, Kasper D. ;
Jaffe, Andrew E. ;
Langmead, Ben ;
Leek, Jeffrey T. .
NATURE BIOTECHNOLOGY, 2017, 35 (04) :319-321
[10]   Big data in biomedicine [J].
Costa, Fabricio F. .
DRUG DISCOVERY TODAY, 2014, 19 (04) :433-440