Reevaluating human gene annotation: A second-generation analysis of chromosome 22

被引:62
作者
Collins, JE [1 ]
Goward, ME [1 ]
Cole, CG [1 ]
Smink, LJ [1 ]
Huckle, EJ [1 ]
Knowles, S [1 ]
Bye, JM [1 ]
Beare, DM [1 ]
Dunham, I [1 ]
机构
[1] Wellcome Trust Sanger Inst, Cambridge CB10 1SA, England
关键词
D O I
10.1101/gr.695703
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We report a second-generation gene annotation of human chromosome 22. Using expressed sequence databases, comparative sequence analysis, and experimental verification, we have extended genes, fused previously fragmented structures, and identified new genes. The total length in exons of annotation was increased by 74% over our previously published annotation and includes 546 protein-coding genes and 234 pseudogenes. Thirty-two potential protein-coding annotations are partial copies of other genes, and may represent duplications on all evolutionary path to change or loss of function. We also identified 31 non-protein-coding transcripts, including 16 possible antisense RNAs. By extrapolation, we estimate the human genome contains 29,000-36,000 protein-coding genes, 21,300 pseudogenes, and 1500 antisense RNAs. We suggest that our revised annotation criteria provide a paradigm for future annotation of the human genome.
引用
收藏
页码:27 / 36
页数:10
相关论文
共 53 条
[1]   Human-specific duplication and mosaic transcripts: The recent paralogous structure of chromosome 22 [J].
Bailey, JA ;
Yavor, AM ;
Viggiano, L ;
Misceo, D ;
Horvath, JE ;
Archidiacono, N ;
Schwartz, S ;
Rocchi, M ;
Eichler, EE .
AMERICAN JOURNAL OF HUMAN GENETICS, 2002, 70 (01) :83-100
[2]   The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :45-48
[3]   ESTABLISHING A HUMAN TRANSCRIPT MAP [J].
BOGUSKI, MS ;
SCHULER, GD .
NATURE GENETICS, 1995, 10 (04) :369-371
[4]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[5]   Evaluation of gene structure prediction programs [J].
Burset, M ;
Guigo, R .
GENOMICS, 1996, 34 (03) :353-367
[6]   Assessment of the total number of human transcription units [J].
Das, M ;
Burge, CB ;
Park, E ;
Colinas, J ;
Pelletier, J .
GENOMICS, 2001, 77 (1-2) :71-78
[7]   Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags [J].
de Souza, SJ ;
Camargo, AA ;
Briones, MRS ;
Costa, FF ;
Nagai, MA ;
Verjovski-Almeida, S ;
Zago, MA ;
Andrade, LEC ;
Carrer, H ;
El-Dorry, HFA ;
Espreafico, EM ;
Habr-Gama, A ;
Giannella-Neto, D ;
Goldman, GH ;
Gruber, A ;
Hackel, C ;
Kimura, ET ;
Maciel, RMB ;
Marie, SKN ;
Martins, EAL ;
Nóbrega, MP ;
Pacó-Larson, ML ;
Pardini, MIMC ;
Pereira, GG ;
Pesquero, JB ;
Rodrigues, V ;
Rogatto, SR ;
da Silva, IDCG ;
Sogayar, MC ;
Sonati, MD ;
Tajara, EH ;
Valentini, SR ;
Acencio, M ;
Alberto, FL ;
Amaral, MEJ ;
Aneas, I ;
Bengtson, MH ;
Carraro, DM ;
Carvalho, AF ;
Carvalho, LH ;
Cerutti, JM ;
Corrêa, MLC ;
Costa, MCR ;
Curcio, C ;
Gushiken, T ;
Ho, PL ;
Kimura, E ;
Leite, LCC ;
Maia, G ;
Majumder, P .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (23) :12690-12693
[8]   The DNA sequence and comparative analysis of human chromosome 20 [J].
Deloukas, P ;
Matthews, LH ;
Ashurst, J ;
Burton, J ;
Gilbert, JGR ;
Jones, M ;
Stavrides, G ;
Almeida, JP ;
Babbage, AK ;
Bagguley, CL ;
Bailey, J ;
Barlow, KF ;
Bates, KN ;
Beard, LM ;
Beare, DM ;
Beasley, OP ;
Bird, CP ;
Blakey, SE ;
Bridgeman, AM ;
Brown, AJ ;
Buck, D ;
Burrill, W ;
Butler, AP ;
Carder, C ;
Carter, NP ;
Chapman, JC ;
Clamp, M ;
Clark, G ;
Clark, LN ;
Clark, SY ;
Clee, CM ;
Clegg, S ;
Cobley, VE ;
Collier, RE ;
Connor, R ;
Corby, NR ;
Coulson, A ;
Coville, GJ ;
Deadman, R ;
Dhami, P ;
Dunn, M ;
Ellington, AG ;
Frankland, JA ;
Fraser, A ;
French, L ;
Garner, P ;
Grafham, DV ;
Griffiths, C ;
Griffiths, ND ;
Gwilliam, R .
NATURE, 2001, 414 (6866) :865-U3
[9]   Active conservation of noncoding sequences revealed by three-way species comparisons [J].
Dubchak, I ;
Brudno, M ;
Loots, GG ;
Pachter, L ;
Mayor, C ;
Rubin, EM ;
Frazer, KA .
GENOME RESEARCH, 2000, 10 (09) :1304-1306
[10]  
Dunham I, 1996, Methods Mol Biol, V54, P253