Petabyte-scale innovations at the European Nucleotide Archive

被引:58
作者
Cochrane, Guy [1 ]
Akhtar, Ruth [1 ]
Bonfield, James [2 ]
Bower, Lawrence [1 ]
Demiralp, Fehmi [1 ]
Faruque, Nadeem [1 ]
Gibson, Richard [1 ]
Hoad, Gemma [1 ]
Hubbard, Tim [2 ]
Hunter, Christopher [1 ]
Jang, Mikyung [1 ]
Juhos, Szilveszter [1 ]
Leinonen, Rasko [1 ]
Leonard, Steven [2 ]
Lin, Quan [1 ]
Lopez, Rodrigo [1 ]
Lorenc, Dariusz [1 ]
McWilliam, Hamish [1 ]
Mukherjee, Gaurab [1 ]
Plaister, Sheila [1 ]
Radhakrishnan, Rajesh [1 ]
Robinson, Stephen [1 ]
Sobhany, Siamak [1 ]
Hoopen, Petra Ten [1 ]
Vaughan, Robert [1 ]
Zalunin, Vadim [1 ]
Birney, Ewan [1 ]
机构
[1] EMBL European Bioinformat Inst, Cambridge CB10 1SD, England
[2] Sanger Inst, Cambridge CB10 1SA, England
基金
英国惠康基金;
关键词
DATABASE; SEQUENCE;
D O I
10.1093/nar/gkn765
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Dramatic increases in the throughput of nucleotide sequencing machines, and the promise of ever greater performance, have thrust bioinformatics into the era of petabyte-scale data sets. Sequence repositories, which provide the feed for these data sets into the worldwide computational infrastructure, are challenged by the impact of these data volumes. The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/embl), comprising the EMBL Nucleotide Sequence Database and the Ensembl Trace Archive, has identified challenges in the storage, movement, analysis, interpretation and visualization of petabyte-scale data sets. We present here our new repository for next generation sequence data, a brief summary of contents of the ENA and provide details of major developments to submission pipelines, high-throughput rule-based validation infrastructure and data integration approaches.
引用
收藏
页码:D19 / D25
页数:7
相关论文
共 12 条
[1]  
Acland A, 2013, NUCLEIC ACIDS RES, V41, pD8, DOI [10.1093/nar/gkq1172, 10.1093/nar/gks1189, 10.1093/nar/gkx1095]
[2]   The universal protein resource (UniProt) [J].
Bairoch, Amos ;
Bougueleret, Lydie ;
Altairac, Severine ;
Amendolia, Valeria ;
Auchincloss, Andrea ;
Puy, Ghislaine Argoud ;
Axelsen, Kristian ;
Baratin, Delphine ;
Blatter, Marie-Claude ;
Boeckmann, Brigitte ;
Bollondi, Laurent ;
Boutet, Emmanuel ;
Quintaje, Silvia Braconi ;
Breuza, Lionel ;
Bridge, Alan ;
deCastro, Edouard ;
Coral, Danielle ;
Coudert, Elisabeth ;
Cusin, Isabelle ;
Dobrokhotov, Pavel ;
Dornevil, Dolnide ;
Duvaud, Severine ;
Estreicher, Anne ;
Famiglietti, Livia ;
Feuermann, Marc ;
Gehant, Sebastian ;
Farriol-Mathis, Nathalie ;
Ferro, Serenella ;
Gasteiger, Elisabeth ;
Gateau, Alain ;
Gerritsen, Vivienne ;
Gos, Arnaud ;
Gruaz-Gumowski, Nadine ;
Hinz, Ursula ;
Hulo, Chantal ;
Hulo, Nicolas ;
Ioannidis, Vassilios ;
Ivanyi, Ivan ;
James, Janet ;
Jain, Eric ;
Jimenez, Silvia ;
Jungo, Florence ;
Junker, Vivien ;
Keller, Guillaume ;
Lachaize, Corinne ;
Lane-Guermonprez, Lydie ;
Langendijk-Genevaux, Petra ;
Lara, Vicente ;
Lemercier, Philippe ;
Le Saux, Virginie .
NUCLEIC ACIDS RESEARCH, 2007, 35 :D193-D197
[3]  
Benson DA, 2017, NUCLEIC ACIDS RES, V45, pD37, DOI [10.1093/nar/gkp1024, 10.1093/nar/gkw1070, 10.1093/nar/gkq1079, 10.1093/nar/gkl986, 10.1093/nar/gkr1202, 10.1093/nar/gkx1094, 10.1093/nar/gks1195, 10.1093/nar/gkn723, 10.1093/nar/gkg057]
[4]   The HGNC Database in 2008: a resource for the human genome [J].
Bruford, Elspeth A. ;
Lush, Michael J. ;
Wright, Mathew W. ;
Sneddon, Tam P. ;
Povey, Sue ;
Birney, Ewan .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D445-D448
[5]   The Mouse Genome Database (MGD): mouse biology and model systems [J].
Bult, Carol J. ;
Eppig, Janan T. ;
Kadin, James A. ;
Richardson, Joel E. ;
Blake, Judith A. .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D724-D728
[6]   Priorities for nucleotide trace, sequence and annotation data capture at the Ensembl Trace Archive and the EMBL Nucleotide Sequence Database [J].
Cochrane, Guy ;
Akhtar, Ruth ;
Aldebert, Philippe ;
Althorpe, Nicola ;
Baldwin, Alastair ;
Bates, Kirsty ;
Bhattacharyya, Sumit ;
Bonfield, James ;
Bower, Lawrence ;
Browne, Paul ;
Castro, Matias ;
Cox, Tony ;
Demiralp, Fehmi ;
Eberhardt, Ruth ;
Faruque, Nadeem ;
Hoad, Gemma ;
Jang, Mikyung ;
Kulikova, Tamara ;
Labarga, Alberto ;
Leinonen, Rasko ;
Leonard, Steven ;
Lin, Quan ;
Lopez, Rodrigo ;
Lorenc, Dariusz ;
McWilliam, Hamish ;
Mukherjee, Gaurab ;
Nardone, Francesco ;
Plaister, Sheila ;
Robinson, Stephen ;
Sobhany, Siamak ;
Vaughan, Robert ;
Wu, Dan ;
Zhu, Weimin ;
Apweiler, Rolf ;
Hubbard, Tim ;
Birney, Ewan .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D5-D12
[7]   The minimum information about a genome sequence (MIGS) specification [J].
Field, Dawn ;
Garrity, George ;
Gray, Tanya ;
Morrison, Norman ;
Selengut, Jeremy ;
Sterk, Peter ;
Tatusova, Tatiana ;
Thomson, Nicholas ;
Allen, Michael J. ;
Angiuoli, Samuel V. ;
Ashburner, Michael ;
Axelrod, Nelson ;
Baldauf, Sandra ;
Ballard, Stuart ;
Boore, Jeffrey ;
Cochrane, Guy ;
Cole, James ;
Dawyndt, Peter ;
De Vos, Paul ;
dePamphilis, Claude ;
Edwards, Robert ;
Faruque, Nadeem ;
Feldman, Robert ;
Gilbert, Jack ;
Gilna, Paul ;
Gloeckner, Frank Oliver ;
Goldstein, Philip ;
Guralnick, Robert ;
Haft, Dan ;
Hancock, David ;
Hermjakob, Henning ;
Hertz-Fowler, Christiane ;
Hugenholtz, Phil ;
Joint, Ian ;
Kagan, Leonid ;
Kane, Matthew ;
Kennedy, Jessie ;
Kowalchuk, George ;
Kottmann, Renzo ;
Kolker, Eugene ;
Kravitz, Saul ;
Kyrpides, Nikos ;
Leebens-Mack, Jim ;
Lewis, Suzanna E. ;
Li, Kelvin ;
Lister, Allyson L. ;
Lord, Phillip ;
Maltsev, Natalia ;
Markowitz, Victor ;
Martiny, Jennifer .
NATURE BIOTECHNOLOGY, 2008, 26 (05) :541-547
[8]   Ensembl 2008 [J].
Flicek, P. ;
Aken, B. L. ;
Beal, K. ;
Ballester, B. ;
Caccamo, M. ;
Chen, Y. ;
Clarke, L. ;
Coates, G. ;
Cunningham, F. ;
Cutts, T. ;
Down, T. ;
Dyer, S. C. ;
Eyre, T. ;
Fitzgerald, S. ;
Fernandez-Banet, J. ;
Graf, S. ;
Haider, S. ;
Hammond, M. ;
Holland, R. ;
Howe, K. L. ;
Howe, K. ;
Johnson, N. ;
Jenkinson, A. ;
Kahari, A. ;
Keefe, D. ;
Kokocinski, F. ;
Kulesha, E. ;
Lawson, D. ;
Longden, I. ;
Megy, K. ;
Meidl, P. ;
Overduin, B. ;
Parker, A. ;
Pritchard, B. ;
Prlic, A. ;
Rice, S. ;
Rios, D. ;
Schuster, M. ;
Sealy, I. ;
Slater, G. ;
Smedley, D. ;
Spudich, G. ;
Trevanion, S. ;
Vilella, A. J. ;
Vogel, J. ;
White, S. ;
Wood, M. ;
Birney, E. ;
Cox, T. ;
Curwen, V. .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D707-D714
[9]   High-throughput sequencing provides insights into genome variation and evolution in Salmonella Typhi [J].
Holt, Kathryn E. ;
Parkhill, Julian ;
Mazzoni, Camila J. ;
Roumagnac, Philippe ;
Weill, Francois-Xavier ;
Goodhead, Ian ;
Rance, Richard ;
Baker, Stephen ;
Maskell, Duncan J. ;
Wain, John ;
Dolecek, Christiane ;
Achtman, Mark ;
Dougan, Gordon .
NATURE GENETICS, 2008, 40 (08) :987-993
[10]   ArrayExpress - a public database of microarray experiments and gene expression profiles [J].
Parkinson, H. ;
Kapushesky, M. ;
Shojatalab, M. ;
Abeygunawardena, N. ;
Coulson, R. ;
Farne, A. ;
Holloway, E. ;
Kolesnykov, N. ;
Lilja, P. ;
Lukk, M. ;
Mani, R. ;
Rayner, T. ;
Sharma, A. ;
William, E. ;
Sarkans, U. ;
Brazma, A. .
NUCLEIC ACIDS RESEARCH, 2007, 35 :D747-D750