Genome-wide discovery of miRNAs using ensembles of machine learning algorithms and logistic regression

被引:5
|
作者
Ulfenborg, Benjamin [1 ]
Klinga-Levan, Karin [1 ]
Olsson, Bjorn [1 ]
机构
[1] Univ Skovde, Sch Biosci, Syst Biol Res Ctr, Skovde, Sweden
关键词
miRNA prediction; miRNA discovery; RNA structure prediction; GenoScan; ensemble classifier; regression model; machine learning; RNA SECONDARY STRUCTURE; WEB SERVER; COMPUTATIONAL IDENTIFICATION; MICRORNA; PREDICTION; CLASSIFICATION; PRECURSORS; SOFTWARE; TOOL; SEQUENCES;
D O I
10.1504/IJDMB.2015.072755
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
In silico prediction of novel miRNAs from genomic sequences remains a challenging problem. This study presents a genome-wide miRNA discovery software package called GenoScan and evaluates two hairpin classification methods. These methods, one ensemble-based and one using logistic regression were benchmarked along with 15 published methods. In addition, the sequence-folding step is addressed by investigating the impact of secondary structure prediction methods and the choice of input sequence length on prediction performance. Both the accuracy of secondary structure predictions and the miRNA prediction are evaluated. In the benchmark of hairpin classification methods, the regression model achieved highest classification accuracy. Of the structure prediction methods evaluated, ContextFold achieved the highest agreement between predicted and experimentally determined structures. However, both the choice of secondary structure prediction method and input sequence length had limited impact on hairpin classification performance.
引用
收藏
页码:338 / 359
页数:22
相关论文
共 50 条
  • [1] Genome-wide discovery of pre-miRNAs: comparison of recent approaches based on machine learning
    Bugnon, Leandro A.
    Yones, Cristian
    Milone, Diego H.
    Stegmayer, Georgina
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (03)
  • [2] Mixed logistic regression in genome-wide association studies
    Milet, Jacqueline
    Courtin, David
    Garcia, Andre
    Perdry, Herve
    BMC BIOINFORMATICS, 2020, 21 (01)
  • [3] Mixed logistic regression in genome-wide association studies
    Jacqueline Milet
    David Courtin
    André Garcia
    Hervé Perdry
    BMC Bioinformatics, 21
  • [4] Predicting genome-wide redundancy using machine learning
    Chen, Huang-Wen
    Bandyopadhyay, Sunayan
    Shasha, Dennis E.
    Birnbaum, Kenneth D.
    BMC EVOLUTIONARY BIOLOGY, 2010, 10
  • [5] Predicting genome-wide redundancy using machine learning
    Huang-Wen Chen
    Sunayan Bandyopadhyay
    Dennis E Shasha
    Kenneth D Birnbaum
    BMC Evolutionary Biology, 10
  • [6] Genome-wide discovery for biomarkers using quantile regression at biobank scale
    Wang, Chen
    Wang, Tianying
    Kiryluk, Krzysztof
    Wei, Ying
    Aschard, Hugues
    Ionita-Laza, Iuliana
    NATURE COMMUNICATIONS, 2024, 15 (01)
  • [7] Genome-wide association analysis by lasso penalized logistic regression
    Wu, Tong Tong
    Chen, Yi Fang
    Hastie, Trevor
    Sobel, Eric
    Lange, Kenneth
    BIOINFORMATICS, 2009, 25 (06) : 714 - 721
  • [8] Loan Repayment Prediction Using Logistic Regression Ensemble Learning With Machine Learning Algorithms
    Dinh, Thuan Nguyen
    Thanh, Binh Pham
    2022 9TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE, ISCMI, 2022, : 79 - 85
  • [9] Discovery and functional understanding of MiRNAs in molluscs: a genome-wide profiling approach
    Huang, Songqian
    Yoshitake, Kazutoshi
    Asaduzzaman, Md
    Kinoshita, Shigeharu
    Watabe, Shugo
    Asakawa, Shuichi
    RNA BIOLOGY, 2021, 18 (11) : 1702 - 1715
  • [10] A Comparison of Cox and Logistic Regression for Use in Genome-Wide Association Studies
    Staley, J.
    Jones, E.
    Kaptoge, S.
    Sweeting, M.
    Wood, A.
    Howson, J.
    HUMAN HEREDITY, 2015, 79 (01) : 46 - 46