Engineering a software tool for gene structure prediction in higher organisms

被引:248
作者
Gremme, G
Brendel, V
Sparks, ME
Kurtz, S
机构
[1] Univ Hamburg, Zentrum Bioinformat, D-20146 Hamburg, Germany
[2] Iowa State Univ, Dept Stat, Ames, IA 50011 USA
[3] Iowa State Univ, Dept Genet Dev & Cell Biol, Ames, IA 50011 USA
关键词
computational biology; genome annotation; similarity-based gene structure prediction; intron cutout technique; incremental updates;
D O I
10.1016/j.infsof.2005.09.005
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The research area now commonly called 'bioinformatics' has brought together biologists, computer scientists, statisticians, and scientists of many other fields of expertise to work on computational solutions to biological problems. A large number of algorithms and software packages are freely available for many specific tasks, such as sequence alignment, molecular phylogeny reconstruction, or protein structure determination. Rapidly changing needs and demands on data handling capacity challenge the application providers to consistently keep pace. In practice, this has led to many incremental advances and re-writing of code that present the user community with confusing options and a large overhead from nonstandardized implementations that need to be integrated into existing work flows. This situation gives much scope for contributions by software engineers. In this article, we describe an example of engineering a software tool for a specific bioinformatics task known as spliced alignment. The problem was motivated by disabling limitations in an original, ad hoc, and yet widely popular implementation by one of the authors. The present collaboration has led to a robust, highly versatile, and extensible tool (named GenomeThreader) that not only overcomes the limitations of the earlier implementation but greatly improves space and time requirements. (c) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:965 / 978
页数:14
相关论文
共 23 条
  • [1] Abouelhoda M. I., 2004, Journal of Discrete Algorithms, V2, P53, DOI 10.1016/S1570-8667(03)00065-0
  • [2] Alberts B., 2002, MOL BIOL CELL
  • [3] DYNAMIC PROGRAMMING
    BELLMAN, R
    [J]. SCIENCE, 1966, 153 (3731) : 34 - &
  • [4] Bentley JL, 1997, PROCEEDINGS OF THE EIGHTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P360
  • [5] GeneWise and genomewise
    Birney, E
    Clamp, M
    Durbin, R
    [J]. GENOME RESEARCH, 2004, 14 (05) : 988 - 995
  • [6] Gene structure prediction from consensus spliced alignment of multiple ESTs matching the same genomic locus
    Brendel, V
    Xing, LQ
    Zhu, W
    [J]. BIOINFORMATICS, 2004, 20 (07) : 1157 - 1169
  • [7] Finishing the euchromatic sequence of the human genome
    Collins, FS
    Lander, ES
    Rogers, J
    Waterston, RH
    [J]. NATURE, 2004, 431 (7011) : 931 - 945
  • [8] Cormen T. H., 1990, INTRO ALGORITHMS
  • [9] The ENCODE (ENCyclopedia of DNA elements) Project
    Feingold, EA
    Good, PJ
    Guyer, MS
    Kamholz, S
    Liefer, L
    Wetterstrand, K
    Collins, FS
    Gingeras, TR
    Kampa, D
    Sekinger, EA
    Cheng, J
    Hirsch, H
    Ghosh, S
    Zhu, Z
    Pate, S
    Piccolboni, A
    Yang, A
    Tammana, H
    Bekiranov, S
    Kapranov, P
    Harrison, R
    Church, G
    Struhl, K
    Ren, B
    Kim, TH
    Barrera, LO
    Qu, C
    Van Calcar, S
    Luna, R
    Glass, CK
    Rosenfeld, MG
    Guigo, R
    Antonarakis, SE
    Birney, E
    Brent, M
    Pachter, L
    Reymond, A
    Dermitzakis, ET
    Dewey, C
    Keefe, D
    Denoeud, F
    Lagarde, J
    Ashurst, J
    Hubbard, T
    Wesselink, JJ
    Castelo, R
    Eyras, E
    Myers, RM
    Sidow, A
    Batzoglou, S
    [J]. SCIENCE, 2004, 306 (5696) : 636 - 640
  • [10] A computer program for aligning a cDNA sequence with a genomic DNA sequence
    Florea, L
    Hartzell, G
    Zhang, Z
    Rubin, GM
    Miller, W
    [J]. GENOME RESEARCH, 1998, 8 (09) : 967 - 974