The ensembl analysis pipeline

被引:75
作者
Potter, SC
Clarke, L
Curwen, V
Keenan, S
Mongin, E
Searle, SMJ
Stabenau, A
Storey, R
Clamp, M
机构
[1] Wellcome Trust Sanger Inst, Cambridge CB10 1SD, England
[2] Wellcome Trust Sanger Inst, Cambridge CB10 1SD, England
[3] EMBL European Bioinformat Inst, Cambridge CB10 1SD, England
[4] Broad Inst, Cambridge, MA 02141 USA
关键词
D O I
10.1101/gr.1859804
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The Ensembl pipeline is an extension to the Ensembl system which allows automated annotation of genomic sequence. The software comprises two parts. First, there is a set of Perl modules ("Runnables" and "RunnableDBs") which are 'wrappers' for a variety of commonly used analysis tools. These retrieve sequence data from a relational database, run the analysis, and write the results back to the database. They inherit from a common interface, which simplifies the writing of new wrapper modules. On top of this sits a job Submission system (the "RuleManager") which allows efficient and reliable submission of large numbers of jobs to a compute farm. Here we describe the fundamental software components of the pipeline, and we also highlight some features of the Sanger installation which were necessary to enable the pipeline to scale to whole-genome analysis.
引用
收藏
页码:934 / 941
页数:8
相关论文
共 14 条
[1]   Tandem repeats finder: a program to analyze DNA sequences [J].
Benson, G .
NUCLEIC ACIDS RESEARCH, 1999, 27 (02) :573-580
[2]   Computational detection and location of transcription start sites in mammalian genomic DNA [J].
Down, TA ;
Hubbard, TJP .
GENOME RESEARCH, 2002, 12 (03) :458-461
[3]  
DURBIN R, 1991, AC ELEGANS DATABASE
[4]   ASAP, a systematic annotation package for community analysis of genomes [J].
Glasner, JD ;
Liss, P ;
Plunkett, G ;
Darling, A ;
Prasad, T ;
Rusch, M ;
Byrnes, A ;
Gilson, M ;
Biehl, B ;
Blattner, FR ;
Perna, NT .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :147-151
[5]   Biopipe: A flexible framework for protocol-based bioinformatics analysis [J].
Hoon, S ;
Ratnapu, KK ;
Chia, J ;
Kumarasamy, B ;
Xiao, JG ;
Clamp, M ;
Stabenau, A ;
Potter, S ;
Clarke, L ;
Stupka, E .
GENOME RESEARCH, 2003, 13 (08) :1904-1915
[6]   Genescript: DNA sequence annotation pipeline [J].
Hudek, AK ;
Cheung, J ;
Boright, AP ;
Scherer, SW .
BIOINFORMATICS, 2003, 19 (09) :1177-1178
[7]   The human genome browser at UCSC [J].
Kent, WJ ;
Sugnet, CW ;
Furey, TS ;
Roskin, KM ;
Pringle, TH ;
Zahler, AM ;
Haussler, D .
GENOME RESEARCH, 2002, 12 (06) :996-1006
[8]   Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes [J].
Krogh, A ;
Larsson, B ;
von Heijne, G ;
Sonnhammer, ELL .
JOURNAL OF MOLECULAR BIOLOGY, 2001, 305 (03) :567-580
[9]   tRNAscan-SE: A program for improved detection of transfer RNA genes in genomic sequence [J].
Lowe, TM ;
Eddy, SR .
NUCLEIC ACIDS RESEARCH, 1997, 25 (05) :955-964
[10]   The InterPro Database, 2003 brings increased coverage and new features [J].
Mulder, NJ ;
Apweiler, R ;
Attwood, TK ;
Bairoch, A ;
Barrell, D ;
Bateman, A ;
Binns, D ;
Biswas, M ;
Bradley, P ;
Bork, P ;
Bucher, P ;
Copley, RR ;
Courcelle, E ;
Das, U ;
Durbin, R ;
Falquet, L ;
Fleischmann, W ;
Griffiths-Jones, S ;
Haft, D ;
Harte, N ;
Hulo, N ;
Kahn, D ;
Kanapin, A ;
Krestyaninova, M ;
Lopez, R ;
Letunic, I ;
Lonsdale, D ;
Silventoinen, V ;
Orchard, SE ;
Pagni, M ;
Peyruc, D ;
Ponting, CP ;
Selengut, JD ;
Servant, F ;
Sigrist, CJA ;
Vaughan, R ;
Zdobnov, EM .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :315-318