Computational prediction of novel non-coding RNAs in Arabidopsis thaliana

被引:35
作者
Song, Dandan [2 ]
Yang, Yang [3 ]
Yu, Bin [1 ]
Zheng, Binglian [1 ]
Deng, Zhidong [2 ]
Lu, Bao-Liang [3 ,4 ]
Chen, Xuemei [1 ]
Jiang, Tao [5 ]
机构
[1] Univ Calif Riverside, Dept Bot & Plant Sci, Riverside, CA 92521 USA
[2] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[3] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China
[4] Shanghai Ctr Syst Biomed, Lab Computat Biol, Shanghai 200240, Peoples R China
[5] Univ Calif Riverside, Dept Comp Sci & Engn, Riverside, CA 92521 USA
关键词
COMPARATIVE GENOMICS; STRUCTURED RNAS; FUNCTIONAL RNAS; SEQUENCE; IDENTIFICATION; ALIGNMENT; REVEALS;
D O I
10.1186/1471-2105-10-S1-S36
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Non-coding RNA (ncRNA) genes do not encode proteins but produce functional RNA molecules that play crucial roles in many key biological processes. Recent genome-wide transcriptional profiling studies using tiling arrays in organisms such as human and Arabidopsis have revealed a great number of transcripts, a large portion of which have little or no capability to encode proteins. This unexpected finding suggests that the currently known repertoire of ncRNAs may only represent a small fraction of ncRNAs of the organisms. Thus, efficient and effective prediction of ncRNAs has become an important task in bioinformatics in recent years. Among the available computational methods, the comparative genomic approach seems to be the most powerful to detect ncRNAs. The recent completion of the sequencing of several major plant genomes has made the approach possible for plants. Results: We have developed a pipeline to predict novel ncRNAs in the Arabidopsis (Arabidopsis thaliana) genome. It starts by comparing the expressed intergenic regions of Arabidopsis as provided in two whole-genome high-density oligo-probe arrays from the literature with the intergenic nucleotide sequences of all completely sequenced plant genomes including rice (Oryza sativa), poplar (Populus trichocarpa), grape (Vitis vinifera), and papaya (Carica papaya). By using multiple sequence alignment, a popular ncRNA prediction program (RNAz), wet-bench experimental validation, protein-coding potential analysis, and stringent screening against various ncRNA databases, the pipeline resulted in 16 families of novel ncRNAs (with a total of 21 ncRNAs). Conclusion: In this paper, we undertake a genome-wide search for novel ncRNAs in the genome of Arabidopsis by a comparative genomics approach. The identified novel ncRNAs are evolutionarily conserved between Arabidopsis and other recently sequenced plants, and may conduct interesting novel biological functions.
引用
收藏
页数:12
相关论文
共 39 条
[1]   Computational prediction of miRNAs in Arabidopsis thaliana [J].
Adai, A ;
Johnson, C ;
Mlotshwa, S ;
Archer-Evans, S ;
Manocha, V ;
Vance, V ;
Sundaresan, V .
GENOME RESEARCH, 2005, 15 (01) :78-91
[2]  
[Anonymous], Rice Genome Annotation Project
[3]  
[Anonymous], TAIR
[4]  
[Anonymous], EMBL
[5]  
*ASRP, ASRP DATAB
[6]   Identification of cyanobacterial non-coding RNAs by comparative genome analysis [J].
Axmann, IM ;
Kensche, P ;
Vogel, J ;
Kohl, S ;
Herzel, H ;
Hess, WR .
GENOME BIOLOGY, 2005, 6 (09)
[7]   Plant snoRNA database [J].
Brown, JWS ;
Echeverria, M ;
Qu, LH ;
Lowe, TM ;
Bachellerie, JP ;
Hüttenhofer, A ;
Kastenmayer, JP ;
Green, PJ ;
Shaw, P ;
Marshall, DF .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :432-435
[8]   Genome-wide high-resolution mapping of exosome substrates reveals hidden features in the Arabidopsis transcriptome [J].
Chekanova, Julia A. ;
Gregory, Brian D. ;
Reverdatto, Sergei V. ;
Chen, Huaming ;
Kumar, Ravi ;
Hooker, Tanya ;
Yazaki, Junshi ;
Li, Pinghua ;
Skiba, Nikolai ;
Peng, Qian ;
Alonso, Jose ;
Brukhin, Vladimir ;
Grossniklaus, Ueli ;
Ecker, Joseph R. ;
Belostotsky, Dmitry A. .
CELL, 2007, 131 (07) :1340-1353
[9]   A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure [J].
Eddy, SR .
BMC BIOINFORMATICS, 2002, 3 (1)
[10]   RNA SEQUENCE-ANALYSIS USING COVARIANCE-MODELS [J].
EDDY, SR ;
DURBIN, R .
NUCLEIC ACIDS RESEARCH, 1994, 22 (11) :2079-2088