Integrating diverse databases into an unified analysis framework: a Galaxy approach

被引:39
作者
Blankenberg, Daniel [1 ,2 ]
Coraor, Nathan [1 ,2 ]
Von Kuster, Gregory [1 ,2 ]
Taylor, James [1 ,3 ,4 ]
Nekrutenko, Anton [1 ,2 ]
机构
[1] Penn State Univ, Galaxy Project, University Pk, PA 16802 USA
[2] Penn State Univ, Huck Inst Life Sci, Dept Biochem & Mol Biol, University Pk, PA 16802 USA
[3] Emory Univ, Dept Biol, Atlanta, GA 30322 USA
[4] Emory Univ, Dept Math & Comp Sci, Atlanta, GA 30322 USA
来源
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION | 2011年
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
D O I
10.1093/database/bar011
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Recent technological advances have lead to the ability to generate large amounts of data for model and non-model organisms. Whereas, in the past, there have been a relatively small number of central repositories that serve genomic data, an increasing number of distinct specialized data repositories and resources have been established. Here, we describe a generic approach that provides for the integration of a diverse spectrum of data resources into a unified analysis framework, Galaxy (http://usegalaxy.org). This approach allows the simplified coupling of external data resources with the data analysis tools available to Galaxy users, while leveraging the native data mining facilities of the external data resources.
引用
收藏
页数:9
相关论文
共 14 条
[1]   Galaxy CloudMan: delivering cloud compute clusters [J].
Afgan, Enis ;
Baker, Dannon ;
Coraor, Nate ;
Chapman, Brad ;
Nekrutenko, Anton ;
Taylor, James .
BMC BIOINFORMATICS, 2010, 11
[2]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[3]   EuPathDB: a portal to eukaryotic pathogen databases [J].
Aurrecoechea, Cristina ;
Brestelli, John ;
Brunk, Brian P. ;
Fischer, Steve ;
Gajria, Bindu ;
Gao, Xin ;
Gingle, Alan ;
Grant, Greg ;
Harb, Omar S. ;
Heiges, Mark ;
Innamorato, Frank ;
Iodice, John ;
Kissinger, Jessica C. ;
Kraemer, Eileen T. ;
Li, Wei ;
Miller, John A. ;
Nayak, Vishal ;
Pennington, Cary ;
Pinney, Deborah F. ;
Roos, David S. ;
Ross, Chris ;
Srinivasamoorthy, Ganesh ;
Stoeckert, Christian J., Jr. ;
Thibodeau, Ryan ;
Treatman, Charles ;
Wang, Haiming .
NUCLEIC ACIDS RESEARCH, 2010, 38 :D415-D419
[4]  
Blankenberg D., 2010, CURR PROTOC MOL BIOL, V19, P101
[5]   A framework for collaborative analysis of ENCODE data: Making large-scale analyses biologist-friendly [J].
Blankenberg, Daniel ;
Taylor, James ;
Schenck, Ian ;
He, Jianbin ;
Zhang, Yi ;
Ghent, Matthew ;
Veeraraghavan, Narayanan ;
Albert, Istvan ;
Miller, Webb ;
Makova, Kateryna D. ;
Hardison, Ross C. ;
Nekrutenko, Anton .
GENOME RESEARCH, 2007, 17 (06) :960-964
[6]  
Bock C, 2010, METHODS MOL BIOL, V628, P275, DOI 10.1007/978-1-60327-367-1_15
[7]   HbVar Database of Human Hemoglobin Variants and Thalassemia Mutations: 2007 Update [J].
Giardine, Belinda ;
van Baal, Sjozef ;
Kaimakis, Polynikis ;
Riemer, Cathy ;
Miller, Webb ;
Samara, Maria ;
Kollia, Panagoula ;
Anagnou, Nicholas P. ;
Chui, David H. K. ;
Wajcman, Henri ;
Hardison, Ross C. ;
Patrinos, George P. .
HUMAN MUTATION, 2007, 28 (02) :206
[8]   Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences [J].
Goecks, Jeremy ;
Nekrutenko, Anton ;
Taylor, James .
GENOME BIOLOGY, 2010, 11 (08)
[9]   BioMart Central Portal-unified access to biological data [J].
Haider, Syed ;
Ballester, Benoit ;
Smedley, Damian ;
Zhang, Junjun ;
Rice, Peter ;
Kasprzyk, Arek .
NUCLEIC ACIDS RESEARCH, 2009, 37 :W23-W27
[10]   Next-generation genomics: an integrative approach [J].
Hawkins, R. David ;
Hon, Gary C. ;
Ren, Bing .
NATURE REVIEWS GENETICS, 2010, 11 (07) :476-486