PredLnc-GFStack: A Global Sequence Feature Based on a Stacked Ensemble Learning Method for Predicting lncRNAs from Transcripts

被引:19
作者
Liu, Shuai [1 ]
Zhao, Xiaohan [1 ]
Zhang, Guangyan [1 ]
Li, Weiyang [1 ]
Liu, Feng [2 ]
Liu, Shichao [1 ]
Zhang, Wen [1 ]
机构
[1] Huazhong Agr Univ, Coll Informat, Wuhan 430070, Hubei, Peoples R China
[2] Wuhan Univ, Sch Comp Sci, Wuhan 430072, Hubei, Peoples R China
基金
中国国家自然科学基金;
关键词
lncRNA prediction; genetic algorithm; stacked ensemble learning; global sequence features; feature selection; LONG NONCODING RNAS; CD-HIT; ANNOTATION; PROTEIN; GENE;
D O I
10.3390/genes10090672
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Long non-coding RNAs (lncRNAs) are a class of RNAs with the length exceeding 200 base pairs (bps), which do not encode proteins, nevertheless, lncRNAs have many vital biological functions. A large number of novel transcripts were discovered as a result of the development of high-throughput sequencing technology. Under this circumstance, computational methods for lncRNA prediction are in great demand. In this paper, we consider global sequence features and propose a stacked ensemble learning-based method to predict lncRNAs from transcripts, abbreviated as PredLnc-GFStack. We extract the critical features from the candidate feature list using the genetic algorithm (GA) and then employ the stacked ensemble learning method to construct PredLnc-GFStack model. Computational experimental results show that PredLnc-GFStack outperforms several state-of-the-art methods for lncRNA prediction. Furthermore, PredLnc-GFStack demonstrates an outstanding ability for cross-species ncRNA prediction.
引用
收藏
页数:13
相关论文
共 59 条
  • [1] LncRNA-ID: Long non-coding RNA IDentification using balanced random forests
    Achawanantakun, Rujira
    Chen, Jiao
    Sun, Yanni
    Zhang, Yuan
    [J]. BIOINFORMATICS, 2015, 31 (24) : 3897 - 3905
  • [2] [Anonymous], 1991, Handbook of genetic algorithms
  • [3] LncRNAnet: long non-coding RNA identification using deep learning
    Baek, Junghwan
    Lee, Byunghan
    Kwon, Sunyoung
    Yoon, Sungroh
    [J]. BIOINFORMATICS, 2018, 34 (22) : 3889 - 3897
  • [4] Considerations when investigating IncRNA function in vivo
    Bassett, Andrew R.
    Akhtar, Asifa
    Barlow, Denise P.
    Bird, Adrian P.
    Brockdorff, Neil
    Duboule, Denis
    Ephrussi, Anne
    Ferguson-Smith, Anne C.
    Gingeras, Thomas R.
    Haerty, Wilfried
    Higgs, Douglas R.
    Miska, Eric A.
    Ponting, Chris P.
    [J]. ELIFE, 2014, 3 : 1 - 14
  • [5] Long Noncoding RNAs: Cellular Address Codes in Development and Disease
    Batista, Pedro J.
    Chang, Howard Y.
    [J]. CELL, 2013, 152 (06) : 1298 - 1307
  • [6] Blickle T., 1995, Proceedings of the Sixth International Conference on Genetic Algorithms, V95, P9
  • [7] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [8] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [9] Bühlmann P, 2002, ANN STAT, V30, P927
  • [10] Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses
    Cabili, Moran N.
    Trapnell, Cole
    Goff, Loyal
    Koziol, Magdalena
    Tazon-Vega, Barbara
    Regev, Aviv
    Rinn, John L.
    [J]. GENES & DEVELOPMENT, 2011, 25 (18) : 1915 - 1927