Genome-wide discovery of miRNAs using ensembles of machine learning algorithms and logistic regression

被引:5
|
作者
Ulfenborg, Benjamin [1 ]
Klinga-Levan, Karin [1 ]
Olsson, Bjorn [1 ]
机构
[1] Univ Skovde, Sch Biosci, Syst Biol Res Ctr, Skovde, Sweden
关键词
miRNA prediction; miRNA discovery; RNA structure prediction; GenoScan; ensemble classifier; regression model; machine learning; RNA SECONDARY STRUCTURE; WEB SERVER; COMPUTATIONAL IDENTIFICATION; MICRORNA; PREDICTION; CLASSIFICATION; PRECURSORS; SOFTWARE; TOOL; SEQUENCES;
D O I
10.1504/IJDMB.2015.072755
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
In silico prediction of novel miRNAs from genomic sequences remains a challenging problem. This study presents a genome-wide miRNA discovery software package called GenoScan and evaluates two hairpin classification methods. These methods, one ensemble-based and one using logistic regression were benchmarked along with 15 published methods. In addition, the sequence-folding step is addressed by investigating the impact of secondary structure prediction methods and the choice of input sequence length on prediction performance. Both the accuracy of secondary structure predictions and the miRNA prediction are evaluated. In the benchmark of hairpin classification methods, the regression model achieved highest classification accuracy. Of the structure prediction methods evaluated, ContextFold achieved the highest agreement between predicted and experimentally determined structures. However, both the choice of secondary structure prediction method and input sequence length had limited impact on hairpin classification performance.
引用
收藏
页码:338 / 359
页数:22
相关论文
共 50 条
  • [41] Genome-wide discovery of maternal effect variants
    Jack W Kent
    Charles P Peterson
    Thomas D Dyer
    Laura Almasy
    John Blangero
    BMC Proceedings, 3 (Suppl 7)
  • [42] Genome-wide discovery of human splicing branchpoints
    Mercer, Tim R.
    Clark, Michael B.
    Andersen, Stacey B.
    Brunck, Marion E.
    Haerty, Wilfried
    Crawford, Joanna
    Taft, Ryan J.
    Nielsen, Lars K.
    Dinger, Marcel E.
    Mattick, John S.
    GENOME RESEARCH, 2015, 25 (02) : 290 - 303
  • [43] Locus Discovery in Genome-wide Association Studies using Bivariate Analysis
    Warrington, Nicole M.
    Evans, David M.
    GENETIC EPIDEMIOLOGY, 2016, 40 (07) : 670 - 670
  • [44] Genome-wide epigenomic profiling for biomarker discovery
    René A. M. Dirks
    Hendrik G. Stunnenberg
    Hendrik Marks
    Clinical Epigenetics, 2016, 8
  • [45] Genome-wide discovery of human heart enhancers
    Narlikar, Leelavati
    Sakabe, Noboru J.
    Blanski, Alexander A.
    Arimura, Fabio E.
    Westlund, John M.
    Nobrega, Marcelo A.
    Ovcharenko, Ivan
    GENOME RESEARCH, 2010, 20 (03) : 381 - 392
  • [46] Genome-wide epigenomic profiling for biomarker discovery
    Dirks, Rene A. M.
    Stunnenberg, Hendrik G.
    Marks, Hendrik
    CLINICAL EPIGENETICS, 2016, 8
  • [47] Genome-wide approaches for cancer gene discovery
    Lizardi, Paul M.
    Forloni, Matteo
    Wajapeyee, Narendra
    TRENDS IN BIOTECHNOLOGY, 2011, 29 (11) : 558 - 568
  • [48] Genome-wide Discovery of Rare Riboswitches in Bacteria
    Arachchilage, Gayan Mirihana
    Atilho, Ruben
    Stav, Shira
    Higgs, Gadareth
    Breaker, Ronald
    FASEB JOURNAL, 2019, 33
  • [49] Multi-stage multi-locus analysis of genome-wide association studies using Random Forests and Logistic Regression
    Parisi, R.
    Bishop, D. T.
    Iles, M. M.
    Barrett, J. H.
    ANNALS OF HUMAN GENETICS, 2009, 73 : 666 - 666
  • [50] An In-memory Architecture for Machine Learning Classifier using Logistic Regression
    Saragada, Prasanna Kumar
    Rathod, Meghnath
    Das, Bishnu Prasad
    2019 IEEE INTERNATIONAL SYMPOSIUM ON SMART ELECTRONIC SYSTEMS (ISES 2019), 2019, : 209 - 214