A tool for analyzing and annotating genomic sequences

被引:174
作者
Huang, XQ [1 ]
Adams, MD [1 ]
Zhou, H [1 ]
Kerlavage, AR [1 ]
机构
[1] INST GENOMIC RES,ROCKVILLE,MD 20850
关键词
D O I
10.1006/geno.1997.4984
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
We describe a tool for analyzing and annotating large genomic sequences containing introns. The analysis and annotation tool (AAT) includes two sets of programs, one for comparing the query sequence with a protein database and the other for comparing the query with a cDNA database. Each set contains a fast database search program and a rigorous alignment program. The database search program quickly identifies regions of the query sequence that are similar to a database sequence. Then the alignment program constructs an optimal alignment for each region and the database sequence. The alignment program also reports the coordinates of exons in the query sequence. Pairwise alignments of the query sequence with protein and cDNA database sequences are combined into multiple sequence alignments, which provide a view of all protein and cDNA sequences matching a query region. On a data set of 570 DNA sequences, AAT identified 94% of coding nucleotides correctly and 74% of exons exactly. Results of analyzing a human BAC sequence with the AAT tool are also presented. The AAT tool reduces the labor-intensive work of locating the exons of the query sequence and improves the process of defining intron-exon boundaries by using the wealth of available protein and cDNA data. (C) 1997 Academic Press.
引用
收藏
页码:37 / 45
页数:9
相关论文
共 19 条
[1]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[2]  
[Anonymous], P RECOMB 97 1 INT C
[3]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[4]   Evaluation of gene structure prediction programs [J].
Burset, M ;
Guigo, R .
GENOMICS, 1996, 34 (03) :353-367
[5]  
CHAO KM, 1995, COMPUT APPL BIOSCI, V11, P147
[6]   The gene identification problem: An overview for developers [J].
Fickett, JW .
COMPUTERS & CHEMISTRY, 1996, 20 (01) :103-118
[7]   Gene recognition via spliced sequence alignment [J].
Gelfand, MS ;
Mironov, AA ;
Pevzner, PA .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1996, 93 (17) :9061-9066
[8]   IDENTIFICATION OF PROTEIN CODING REGIONS BY DATABASE SIMILARITY SEARCH [J].
GISH, W ;
STATES, DJ .
NATURE GENETICS, 1993, 3 (03) :266-272
[9]  
Guan XJ, 1996, COMPUT APPL BIOSCI, V12, P31
[10]  
Huang X, 1996, Microb Comp Genomics, V1, P281