Identification of pseudogenes in the Drosophila melanogaster genome

被引:78
作者
Harrison, PM [1 ]
Milburn, D [1 ]
Zhang, Z [1 ]
Bertone, P [1 ]
Gerstein, M [1 ]
机构
[1] Yale Univ, Dept Mol Biophys & Biochem, New Haven, CT 06520 USA
关键词
D O I
10.1093/nar/gkg169
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Pseudogenes are copies of genes that cannot produce a protein. They can be detected from disruptions to their apparent coding sequence, caused by frameshifts and premature stop codons. They are classed as either processed pseudogenes (made by reverse transcription from an mRNA) or duplicated pseudogenes, arising from duplication in the genomic DNA and subsequent disablement. Historically, there is anecdotal evidence that the fruit fly (Drosophila melanogaster) has few pseudogenes. Investigators have linked this to a high deletion rate of genomic DNA, for which there is evidence from genetic experiments on genome size. Here, we apply a homology-based pipeline that was developed previously to identify pseudogenes in other eukaryotic genomes, to the fruit fly, so as to derive the first complete survey of its pseudogene population. We find approximately 100 pseudogenes, with at least a sixth of these as candidate processed pseudogenes. This gives a much lower proportion of pseudogenes (compared with the size of the proteome) than in the genomes of other eukaryotes for which data are available (human, nematode and budding yeast). Closest matching proteins to Drosophila pseudogenes are significantly longer than the average protein in its proteome (up to similar to60% more than the average protein's length), in contrast to the situation in the three other eukaryotic genomes. This may be due to the persistence of fragments of longer genes. In the fly pseudogene population, we found most pseudogenes for serine proteases (which are more abundant in the Drosophila lineage compared with the other eukaryotes), immunoglobulin-motif-containing proteins and cytochromes P450. Data on the sequences and positions of the putative pseudogenes are available at: http://www.pseudogene.org/fly. The detection of a small number of pseudogenes in the Drosophila genome and the higher mean length for the closest matching proteins to pseudogenes (possibly because remnants of genes encoding longer proteins are more likely to persist) are further evidence for a high deletion rate of genomic DNA in the fruit fly. The data are useful for molecular evolution study in Drosophila.
引用
收藏
页码:1033 / 1037
页数:5
相关论文
共 27 条
[1]   The genome sequence of Drosophila melanogaster [J].
Adams, MD ;
Celniker, SE ;
Holt, RA ;
Evans, CA ;
Gocayne, JD ;
Amanatides, PG ;
Scherer, SE ;
Li, PW ;
Hoskins, RA ;
Galle, RF ;
George, RA ;
Lewis, SE ;
Richards, S ;
Ashburner, M ;
Henderson, SN ;
Sutton, GG ;
Wortman, JR ;
Yandell, MD ;
Zhang, Q ;
Chen, LX ;
Brandon, RC ;
Rogers, YHC ;
Blazej, RG ;
Champe, M ;
Pfeiffer, BD ;
Wan, KH ;
Doyle, C ;
Baxter, EG ;
Helt, G ;
Nelson, CR ;
Miklos, GLG ;
Abril, JF ;
Agbayani, A ;
An, HJ ;
Andrews-Pfannkoch, C ;
Baldwin, D ;
Ballew, RM ;
Basu, A ;
Baxendale, J ;
Bayraktaroglu, L ;
Beasley, EM ;
Beeson, KY ;
Benos, PV ;
Berman, BP ;
Bhandari, D ;
Bolshakov, S ;
Borkova, D ;
Botchan, MR ;
Bouck, J ;
Brokstein, P .
SCIENCE, 2000, 287 (5461) :2185-2195
[2]   The InterPro database, an integrated documentation resource for protein families, domains and functional sites [J].
Apweiler, R ;
Attwood, TK ;
Bairoch, A ;
Bateman, A ;
Birney, E ;
Biswas, M ;
Bucher, P ;
Cerutti, T ;
Corpet, F ;
Croning, MDR ;
Durbin, R ;
Falquet, L ;
Fleischmann, W ;
Gouzy, J ;
Hermjakob, H ;
Hulo, N ;
Jonassen, I ;
Kahn, D ;
Kanapin, A ;
Karavidopoulou, Y ;
Lopez, R ;
Marx, B ;
Mulder, NJ ;
Oinn, TM ;
Pagni, M ;
Servant, F ;
Sigrist, CJA ;
Zdobnov, EM .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :37-40
[3]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[4]   Osa associates with the Brahma chromatin remodeling complex and promotes the activation of some target genes [J].
Collins, RT ;
Furukawa, T ;
Tanese, N ;
Treisman, JE .
EMBO JOURNAL, 1999, 18 (24) :7029-7040
[5]   Human LINE retrotransposons generate processed pseudogenes [J].
Esnault, C ;
Maestre, J ;
Heidmann, T .
NATURE GENETICS, 2000, 24 (04) :363-367
[6]   The complete human olfactory subgenome [J].
Glusman, G ;
Yanai, I ;
Rubin, I ;
Lancet, D .
GENOME RESEARCH, 2001, 11 (05) :685-702
[7]   In silico detection of control signals:: mRNA 3′-end-processing sequences in diverse species [J].
Graber, JH ;
Cantor, CR ;
Mohr, SC ;
Smith, TF .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (24) :14055-14060
[8]   A small reservoir of disabled ORFs in the yeast genome and its implications for the dynamics of proteome evolution [J].
Harrison, P ;
Kumar, A ;
Lan, N ;
Echols, N ;
Snyder, M ;
Gerstein, M .
JOURNAL OF MOLECULAR BIOLOGY, 2002, 316 (03) :409-419
[9]   A question of size: the eukaryotic proteome and the problems in defining it [J].
Harrison, PM ;
Kumar, A ;
Lang, N ;
Snyder, M ;
Gerstein, M .
NUCLEIC ACIDS RESEARCH, 2002, 30 (05) :1083-1090
[10]   Studying Genomes through the aeons: Protein families, pseudogenes and proteome evolution [J].
Harrison, PM ;
Gerstein, M .
JOURNAL OF MOLECULAR BIOLOGY, 2002, 318 (05) :1155-1174