Evaluation of machine learning models that predict lncRNA subcellular localization

被引:0
作者
Miller, Jason R. [1 ,2 ]
Yi, Weijun [2 ]
Adjeroh, Donald A. [2 ]
机构
[1] Hood Coll, Dept Comp Sci & Informat Technol, Frederick, MD 21701 USA
[2] West Virginia Univ, Lane Dept Comp Sci & Elect Engn, Morgantown, WV 26506 USA
基金
美国国家科学基金会;
关键词
RNALOCATE; RESOURCE; GENCODE;
D O I
10.1093/nargab/lqae125
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The lncATLAS database quantifies the relative cytoplasmic versus nuclear abundance of long non-coding RNAs (lncRNAs) observed in 15 human cell lines. The literature describes several machine learning models trained and evaluated on these and similar datasets. These reports showed moderate performance, e.g. 72-74% accuracy, on test subsets of the data withheld from training. In all these reports, the datasets were filtered to include genes with extreme values while excluding genes with values in the middle range and the filters were applied prior to partitioning the data into training and testing subsets. Using several models and lncATLAS data, we show that this 'middle exclusion' protocol boosts performance metrics without boosting model performance on unfiltered test data. We show that various models achieve only about 60% accuracy when evaluated on unfiltered lncRNA data. We suggest that the problem of predicting lncRNA subcellular localization from nucleotide sequences is more challenging than currently perceived. We provide a basic model and evaluation procedure as a benchmark for future studies of this problem. Graphical Abstract
引用
收藏
页数:9
相关论文
共 38 条
  • [1] EL-RMLocNet: An explainable LSTM network for RNA-associated multi-compartment localization prediction
    Asim, Muhammad Nabeel
    Ibrahim, Muhammad Ali
    Malik, Muhammad Imran
    Zehe, Christoph
    Cloarec, Olivier
    Trygg, Johan
    Dengel, Andreas
    Ahmed, Sheraz
    [J]. COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2022, 20 : 3986 - 4002
  • [2] ncRNALocate-EL: a multi-label ncRNA subcellular locality prediction model based on ensemble learning
    Bai, Tao
    Liu, Bin
    [J]. BRIEFINGS IN FUNCTIONAL GENOMICS, 2023, 22 (05) : 442 - 452
  • [3] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [4] LNCcation: lncRNA localization and function
    Bridges, Mary Catherine
    Daulagala, Amanda C.
    Kourtidis, Antonis
    [J]. JOURNAL OF CELL BIOLOGY, 2021, 220 (02)
  • [5] GM-lncLoc: LncRNAs subcellular localization prediction based on graph neural network with meta-learning
    Cai, Junzhe
    Wang, Ting
    Deng, Xi
    Tang, Lin
    Liu, Lin
    [J]. BMC GENOMICS, 2023, 24 (01)
  • [6] The lncLocator: a subcellular localization predictor for long non-coding RNAs based on a stacked ensemble classifier
    Cao, Zhen
    Pan, Xiaoyong
    Yang, Yang
    Huang, Yan
    Shen, Hong-Bin
    [J]. BIOINFORMATICS, 2018, 34 (13) : 2185 - 2194
  • [7] Optimizing the Cell Painting assay for image-based profiling
    Cimini, Beth A.
    Chandrasekaran, Srinivas Niranj
    Kost-Alimova, Maria
    Miller, Lisa
    Goodale, Amy
    Fritchman, Briana
    Byrne, Patrick
    Garg, Sakshi
    Jamali, Nasim
    Logan, David J.
    Concannon, John B.
    Lardeau, Charles-Hugues
    Mouchet, Elizabeth
    Singh, Shantanu
    Abbasi, Hamdah Shafqat
    Aspesi, Jr Peter
    Boyd, Justin D.
    Gilbert, Tamara
    Gnutt, David
    Hariharan, Santosh
    Hernandez, Desiree
    Hormel, Gisela
    Juhani, Karolina
    Melanson, Michelle
    Mervin, Lewis H.
    Monteverde, Tiziana
    Pilling, James E.
    Skepner, Adam
    Swalley, Susanne E.
    Vrcic, Anita
    Weisbart, Erin
    Williams, Guy
    Yu, Shan
    Zapiec, Bolek
    Carpenter, Anne E.
    [J]. NATURE PROTOCOLS, 2023, 18 (07) : 1981 - +
  • [8] CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
  • [9] RNALocate v2.0: an updated resource for RNA subcellular localization with increased coverage and annotation
    Cui, Tianyu
    Dou, Yiying
    Tan, Puwen
    Ni, Zhen
    Liu, Tianyuan
    Wang, DuoLin
    Huang, Yan
    Cai, Kaican
    Zhao, Xiaoyang
    Xu, Dong
    Lin, Hao
    Wang, Dong
    [J]. NUCLEIC ACIDS RESEARCH, 2022, 50 (D1) : D333 - D339
  • [10] The ENCODE project
    de Souza, Natalie
    [J]. NATURE METHODS, 2012, 9 (11) : 1046 - 1046