Pattern recognition analysis on long noncoding RNAs: a tool for prediction in plants

被引:44
作者
Negri, Tatianne da Costa [1 ]
Luz Alves, Wonder Alexandre [1 ]
Bugatti, Pedro Henrique [2 ]
Maeda Saito, Priscila Tiemi [2 ]
Domingues, Douglas Silva [3 ]
Paschoal, Alexandre Rossi [4 ]
机构
[1] UNINOVE, Informat & Knowledge Management Grad Program, Sao Paulo, Brazil
[2] UTFPR, Dept Comp Sci, Cornelio Procopio, PR, Brazil
[3] Univ Estadual Paulista, Inst Biosci Rio Claro, Dept Bot, Sao Paulo, Brazil
[4] Fed Univ Technol Parana UTFPR, Dept Comp Sci, Apucarana, Brazil
关键词
bioinformatics; tool; features; machine learning; long RNAs; pattern recognition; GENOME-WIDE IDENTIFICATION; LNCRNAS; TRANSCRIPTS; POPULUS; PROTEIN;
D O I
10.1093/bib/bby034
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Long noncoding RNAs (lncRNAs) correspond to a eukaryotic noncoding RNA class that gained great attention in the past years as a higher layer of regulation for gene expression in cells. There is, however, a lack of specific computational approaches to reliably predict lncRNA in plants, which contrast the variety of prediction tools available for mammalian lncRNAs. This distinction is not that obvious, given that biological features and mechanisms generating lncRNAs in the cell are likely different between animals and plants. Considering this, we present a machine learning analysis and a classifier approach called RNAplonc (https://github. com/TatianneNegri/RNAplonc/) to identify lncRNAs in plants. Results: Our feature selection analysis considered 5468 features, and it used only 16 features to robustly identify lncRNA with the REPTree algorithm. That was the base to create the model and train it with lncRNA and mRNA data from five plant species (thale cress, cucumber, soybean, poplar and Asian rice). After an extensive comparison with other tools largely used in plants (CPC, CPC2, CPAT and PLncPRO), we found that RNAplonc produced more reliable lncRNA predictions from plant transcripts with 87.5% of the best result in eight tests in eight species from the GreeNC database and four independent studies in monocotyledonous (Brachypodium) and eudicotyledonous (Populus and Gossypium) species.
引用
收藏
页码:682 / 689
页数:8
相关论文
共 34 条
  • [1] [Anonymous], 1973, PATTERN CLASSIFICATI
  • [2] RNA World research-still evolving
    Cech, Thomas R.
    [J]. RNA, 2015, 21 (04) : 474 - 475
  • [3] Genome-wide identification of novel long non-coding RNAs in Populus tomentosa tension wood, opposite wood and normal wood xylem by RNA-seq
    Chen, Jinhui
    Quan, Mingyang
    Zhang, Deqiang
    [J]. PLANTA, 2015, 241 (01) : 125 - 143
  • [4] Genome-wide identification and characterization of novel lncRNAs in Populus under nitrogen deficiency
    Chen, Min
    Wang, Chenlu
    Bao, Hai
    Chen, Hui
    Wang, Yanwei
    [J]. MOLECULAR GENETICS AND GENOMICS, 2016, 291 (04) : 1663 - 1680
  • [5] Long noncoding RNAs in the model species Brachypodium distachyon
    De Quattro, Concetta
    Pe, Mario Enrico
    Bertolini, Edoardo
    [J]. SCIENTIFIC REPORTS, 2017, 7
  • [6] Target mimicry provides a new mechanism for regulation of microRNA activity
    Franco-Zorrilla, Jose Manuel
    Valli, Adrian
    Todesco, Marco
    Mateos, Isabel
    Puga, Maria Isabel
    Rubio-Somoza, Ignacio
    Leyva, Antonio
    Weigel, Detlef
    Garcia, Juan Antonio
    Paz-Ares, Javier
    [J]. NATURE GENETICS, 2007, 39 (08) : 1033 - 1037
  • [7] Phytozome: a comparative platform for green plant genomics
    Goodstein, David M.
    Shu, Shengqiang
    Howson, Russell
    Neupane, Rochak
    Hayes, Richard D.
    Fazo, Joni
    Mitros, Therese
    Dirks, William
    Hellsten, Uffe
    Putnam, Nicholas
    Rokhsar, Daniel S.
    [J]. NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) : D1178 - D1186
  • [8] Hall M., 2009, SIGKDD Explor., V11, P10, DOI [10.1145/1656274.1656278, DOI 10.1145/1656274.1656278]
  • [9] Epigenetic regulation by long noncoding RNAs in plants
    Heo, Jae Bok
    Lee, Yong-Suk
    Sung, Sibum
    [J]. CHROMOSOME RESEARCH, 2013, 21 (6-7) : 685 - 693
  • [10] Genome Wide Identification and Functional Prediction of Long Non-Coding RNAs Responsive to Sclerotinia sclerotiorum Infection in Brassica napus
    Joshi, Raj Kumar
    Megha, Swati
    Basu, Urmila
    Rahman, Muhammad H.
    Kav, Nat N. V.
    [J]. PLOS ONE, 2016, 11 (07):