Genome annotation

被引:15
作者
Aubourg, S [1 ]
Rouzé, P [1 ]
机构
[1] State Univ Ghent VIB, Dept Plant Genet,Lab Inst Natl Rech Agron France, B-9000 Ghent, Belgium
关键词
annotation; Arabidopsis; bioinformatics; database; gene prediction; genomics;
D O I
10.1016/S0981-9428(01)01242-6
中图分类号
Q94 [植物学];
学科分类号
071001 ;
摘要
Today, the public international sequence databases contain more than nine billion nucleotides and the flow of new sequences is increasing dramatically. For scientists, the challenge is to exploit this huge amount of sequences. To extract biological knowledge from anonymous genomic sequences is the main objective of genome annotation. To meet the expectations of scientists, allowing them to use genomic knowledge for further experimentation as quickly as possible, the extensive use of computer tools is needed to minimize the slow and costly human interventions. This is the reason why annotation is often synonymous with prediction. The annotation work is divided into two steps: structural annotation, which consists mainly of localizing gene elements; and functional annotation, which aims at assigning a biochemical function to the deduced gene products. The different tools and strategies used to convert sequences to useful data will be discussed in detail with their advantages and bottlenecks. By focusing on plant genomes and especially on the Arabidopsis thaliana genome, the general results and their different display will be presented. The international annotation effort allows us to have an interesting overview of the Arabidopsis genome organization: general gene features and functions, classification into multigene families, importance of duplication events and chromosome structure. Furthermore, the limits and errors of these annotations are highlighted in order to use the sequence databases at their best and to consider some novel approaches to deepen the understanding of the regulation and the biological function of the genes. (C) 2001 Editions scientifiques et medicales Elsevier SAS.
引用
收藏
页码:181 / 193
页数:13
相关论文
共 87 条
[61]   Overview of the yeast genome [J].
Mewes, HW ;
Albermann, K ;
Bahr, M ;
Frishman, D ;
Gleissner, A ;
Hani, J ;
Heumann, K ;
Kleine, K ;
Maierl, A ;
Oliver, SG ;
Pfeiffer, F ;
Zollner, A .
NATURE, 1997, 387 (6632) :7-8
[62]   The role of the genome project in determining gene function: Insights from model organisms [J].
Miklos, GLG ;
Rubin, GM .
CELL, 1996, 86 (04) :521-529
[63]   A contiguous 60 kb genomic stretch from barley reveals molecular evidence for gene islands in a monocot genome [J].
Panstruga, R ;
Büschges, R ;
Piffanelli, P ;
Schulze-Lefert, P .
NUCLEIC ACIDS RESEARCH, 1998, 26 (04) :1056-1062
[64]   Evaluation of gene prediction software using a genomic data set:: application to Arabidopsis thaliana sequences [J].
Pavy, N ;
Rombauts, S ;
Déhais, P ;
Mathé, C ;
Ramana, DVV ;
Leroy, P ;
Rouzé, P .
BIOINFORMATICS, 1999, 15 (11) :887-899
[65]   Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles [J].
Pellegrini, M ;
Marcotte, EM ;
Thompson, MJ ;
Eisenberg, D ;
Yeates, TO .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (08) :4285-4288
[66]   The TIGR Gene Indices: reconstruction and representation of expressed gene sequences [J].
Quackenbush, J ;
Liang, F ;
Holt, I ;
Pertea, G ;
Upton, J .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :141-145
[67]   Differential methylation of genes and retrotransposons facilitates shotgun sequencing of the maize genome [J].
Rabinowicz, PD ;
Schutz, K ;
Dedhia, N ;
Yordan, C ;
Parnell, LD ;
Stein, L ;
McCombie, WR ;
Martienssen, RA .
NATURE GENETICS, 1999, 23 (03) :305-308
[68]  
RENNER A, 2000, PAC S BIOCOMPUT, V12, P54
[69]   PlantCARE, a plant cis-acting regulatory element database [J].
Rombauts, S ;
Déhais, P ;
Van Montagu, M ;
Rouzé, P .
NUCLEIC ACIDS RESEARCH, 1999, 27 (01) :295-296
[70]   Large-scale sequencing of plant genomes [J].
Rounsley, S ;
Lin, XY ;
Ketchum, KA .
CURRENT OPINION IN PLANT BIOLOGY, 1998, 1 (02) :136-141