A large number of novel coding small open reading frames in the intergenic regions of the Arabidopsis thaliana genome are transcribed and/or under purifying selection

被引:128
作者
Hanada, Kousuke
Zhang, Xu
Borevitz, Justin O.
Li, Wen-Hsiung
Shiu, Shin-Han [1 ]
机构
[1] Michigan State Univ, Dept Plant Biol, E Lansing, MI 48824 USA
[2] Univ Chicago, Dept Ecol & Evolut, Chicago, IL 60637 USA
关键词
D O I
10.1101/gr.5836207
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Large-scale cDNA sequencing projects and tiling array studies have revealed the presence of many unannotated genes. For protein coding genes, small coding sequences may not be identified by gene finders because of the conservative nature of prediction algorithms. In this study, we identified small open reading frames (sORFs) with high coding potential by a simple gene finding method (Coding Index, CI) based on the nucleotide composition bias found in most coding sequences. Applying this method to 18 Arabidopsis thaliana and 84 yeast sORF genes with evidence of expression at the protein level gives 100% accurate prediction. In the A. thaliana genome, we identified 7159 sORFs that are likely coding sequences (coding sORFs) with the CI measure at the 1% false-positive rate. To determine if these coding sORFs are parts of functional genes, we evaluated each coding sORF for evidence of transcription or evolutionary conservation. At the 5% false-positive rate, we found that 2996 coding sORFs are likely expressed in at least one experimental condition of the A. thaliana tiling array data. In addition, the evolutionary conservation of each A. thaliana sORF was examined within A. thaliana or between A. thaliana and five plants with complete or partial genome sequences. In 3997 coding sORFs with readily identifiable homologous sequences, 2376 are subject to purifying selection at the 1% false-positive rate. After eliminating coding sORFs with similarity to known transposable elements and those that are likely missing exons of known genes, the remaining 3241 coding sORFs with either evidence of transcription or purifying selection likely belong to novel coding genes in the A. thaliana genome.
引用
收藏
页码:632 / 640
页数:9
相关论文
共 46 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] Whole genome shotgun sequencing of Brassica oleracea and its application to gene discovery and annotation in Arabidopsis
    Ayele, M
    Haas, BJ
    Kumar, N
    Wu, H
    Xiao, YL
    Van Aken, S
    Utterback, TR
    Wortman, JR
    White, OR
    Town, CD
    [J]. GENOME RESEARCH, 2005, 15 (04) : 487 - 495
  • [3] Barrett T, 2005, NUCLEIC ACIDS RES, V33, pD562
  • [4] Small open reading frames: Beautiful needles in the haystack
    Basrai, MA
    Hieter, P
    Boeke, JD
    [J]. GENOME RESEARCH, 1997, 7 (08): : 768 - 771
  • [5] The Medicago Genome Initiative:: a model legume database
    Bell, CJ
    Dixon, RA
    Farmer, AD
    Flores, R
    Inman, J
    Gonzales, RA
    Harrison, MJ
    Paiva, NL
    Scott, AD
    Weller, JW
    May, GD
    [J]. NUCLEIC ACIDS RESEARCH, 2001, 29 (01) : 114 - 117
  • [6] BENNETZEN JL, 1982, J BIOL CHEM, V257, P3026
  • [7] Protein variety and functional diversity: Swiss-Prot annotation in its biological context
    Boeckmann, B
    Blatter, MC
    Famiglietti, L
    Hinz, U
    Lane, L
    Roechert, B
    Bairoch, A
    [J]. COMPTES RENDUS BIOLOGIES, 2005, 328 (10-11) : 882 - 899
  • [8] Large-scale identification of single-feature polymorphisms in complex genomes
    Borevitz, JO
    Liang, D
    Plouffe, D
    Chang, HS
    Zhu, T
    Weigel, D
    Berry, CC
    Winzeler, E
    Chory, J
    [J]. GENOME RESEARCH, 2003, 13 (03) : 513 - 523
  • [9] DETECTION OF NEW GENES IN A BACTERIAL GENOME USING MARKOV-MODELS FOR 3 GENE CLASSES
    BORODOVSKY, M
    MCININCH, JD
    KOONIN, EV
    RUDD, KE
    MEDIGUE, C
    DANCHIN, A
    [J]. NUCLEIC ACIDS RESEARCH, 1995, 23 (17) : 3554 - 3562
  • [10] Recent advances in gene structure prediction
    Brent, MR
    Guigó, R
    [J]. CURRENT OPINION IN STRUCTURAL BIOLOGY, 2004, 14 (03) : 264 - 272