ASSESSMENT OF PROTEIN CODING MEASURES

被引:259
作者
FICKETT, JW [1 ]
TUNG, CS [1 ]
机构
[1] LOS ALAMOS NATL LAB, CTR HUMAN GENOME STUDIES, LOS ALAMOS, NM 87545 USA
关键词
D O I
10.1093/nar/20.24.6441
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
A number of methods for recognizing protein coding genes in DNA sequence have been published over the last 13 years, and new, more comprehensive algorithms, drawing on the repertoire of existing techniques, continue to be developed. To optimize continued development, it is valuable to systematically review and evaluate published techniques. At the core of most gene recognition algorithms is one or more coding measures - functions which produce, given any sample window of sequence, a number or vector intended to measure the degree to which a sample sequence resembles a window of 'typical' exonic DNA. In this paper we review and synthesize the underlying coding measures from published algorithms. A standardized benchmark is described, and each of the measures is evaluated according to this benchmark. Our main conclusion is that a very simple and obvious measure - counting oligomers - is more effective than any of the more sophisticated measures. Different measures contain different information. However there is a great deal of redundancy in the current suite of measures. We show that in future development of gene recognition algorithms, attention can probably be limited to six of the twenty or so measures proposed to date.
引用
收藏
页码:6441 / 6450
页数:10
相关论文
共 57 条
[1]   NUCLEOTIDE DISTRIBUTION AND THE RECOGNITION OF CODING REGIONS IN DNA-SEQUENCES - AN INFORMATION-THEORY APPROACH [J].
ALMAGOR, H .
JOURNAL OF THEORETICAL BIOLOGY, 1985, 117 (01) :127-136
[2]   STUDY OF A PERTURBATION IN THE CODING PERIODICITY [J].
ARQUES, DG ;
MICHEL, CJ .
MATHEMATICAL BIOSCIENCES, 1987, 86 (01) :1-14
[3]   PERIODICITIES IN INTRONS [J].
ARQUES, DG ;
MICHEL, CJ .
NUCLEIC ACIDS RESEARCH, 1987, 15 (18) :7581-7592
[4]   THE RELATIONSHIP BETWEEN BASE COMPOSITION AND CODON USAGE IN BACTERIAL GENES AND ITS USE FOR THE SIMPLE AND RELIABLE IDENTIFICATION OF PROTEIN-CODING SEQUENCES [J].
BIBB, MJ ;
FINDLAY, PR ;
JOHNSON, MW .
GENE, 1984, 30 (1-3) :157-166
[6]   DISTRIBUTION AND EVOLUTION OF SEQUENCE CHARACTERISTICS IN THE ESCHERICHIA-COLI GENOME [J].
BLAKE, RD ;
EARLEY, S .
JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 1986, 4 (02) :291-307
[7]  
Borodovskii M Iu, 1986, Mol Biol (Mosk), V20, P1390
[8]  
Borodovskii M Iu, 1986, Mol Biol (Mosk), V20, P1024
[9]  
BORODOVSKII MY, 1986, MOL BIOL+, V20, P1144
[10]   ELECTRONIC DATA PUBLISHING AND GENBANK [J].
CINKOSKY, MJ ;
FICKETT, JW ;
GILNA, P ;
BURKS, C .
SCIENCE, 1991, 252 (5010) :1273-1277