MSLP: mRNA subcellular localization predictor based on machine learning techniques

被引:9
|
作者
Musleh, Saleh [1 ]
Islam, Mohammad Tariqul [2 ]
Qureshi, Rizwan [1 ]
Alajez, Nihad [3 ,4 ]
Alam, Tanvir [1 ]
机构
[1] Hamad Bin Khalifa Univ, Coll Sci & Engn, Doha, Qatar
[2] Southern Connecticut State Univ, Comp Sci Dept, New Haven, CT USA
[3] Hamad Bin Khalifa Univ, Qatar Biomed Res Inst QBRI, Translat Canc & Immun Ctr TC, Doha, Qatar
[4] Hamad Bin Khalifa Univ, Coll Hlth & Life Sci, Doha, Qatar
关键词
RNA; mRNA; Machine learning; Sequence analysis; Localization prediction; Subcellular localization; NERVOUS-SYSTEM; RNALOCATE; SEQUENCES; RESOURCE;
D O I
10.1186/s12859-023-05232-0
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Subcellular localization of messenger RNA (mRNAs) plays a pivotal role in the regulation of gene expression, cell migration as well as in cellular adaptation. Experiment techniques for pinpointing the subcellular localization of mRNAs are laborious, time-consuming and expensive. Therefore, in silico approaches for this purpose are attaining great attention in the RNA community. Methods: In this article, we propose MSLP, a machine learning-based method to predict the subcellular localization of mRNA. We propose a novel combination of four types of features representing k-mer, pseudo k-tuple nucleotide composition (PseKNC), physicochemical properties of nucleotides, and 3D representation of sequences based on Z-curve transformation to feed into machine learning algorithm to predict the subcellular localization of mRNAs. Results: Considering the combination of the above-mentioned features, ennsemble-based models achieved state-of-the-art results in mRNA subcellular localization prediction tasks for multiple benchmark datasets. We evaluated the performance of our method in ten subcellular locations, covering cytoplasm, nucleus, endoplasmic reticulum (ER), extracellular region (ExR), mitochondria, cytosol, pseudopodium, posterior, exosome, and the ribosome. Ablation study highlighted k-mer and PseKNC to be more dominant than other features for predicting cytoplasm, nucleus, and ER localizations. On the other hand, physicochemical properties and Z-curve based features contributed the most to ExR and mitochondria detection. SHAP-based analysis revealed the relative importance of features to provide better insights into the proposed approach. Availability: We have implemented a Docker container and API for end users to run their sequences on our model. Datasets, the code of API and the Docker are shared for the community in GitHub at: https://github.com/smusleh/MSLP.
引用
收藏
页数:23
相关论文
共 50 条
  • [21] Online Machine Learning Based Predictor for Biological Systems
    Marquez, Giovanny
    Johnson, Bethany
    Jafari, Mohammad
    Gomez, Marcella
    2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), 2019, : 120 - 125
  • [22] MetaLocGramN: A meta-predictor of protein subcellular localization for Gram-negative bacteria
    Magnus, Marcin
    Pawlowski, Marcin
    Bujnicki, Janusz M.
    BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS, 2012, 1824 (12): : 1425 - 1433
  • [23] Automatic localization and annotation of facial features using machine learning techniques
    Conilione, Paul C.
    Wang, Dianhui
    SOFT COMPUTING, 2011, 15 (06) : 1231 - 1245
  • [24] Automatic localization and annotation of facial features using machine learning techniques
    Paul C. Conilione
    Dianhui Wang
    Soft Computing, 2011, 15 : 1231 - 1245
  • [25] CSpredR: A Multi-Site mRNA Subcellular Localization Prediction Method Based on Fusion Encoding and Hybrid Neural Networks
    Wang, Xiao
    Suo, Wenshuai
    Wang, Rong
    ALGORITHMS, 2025, 18 (02)
  • [26] A Survey of PAPR Techniques Based on Machine Learning
    da Silva, Bianca S. de C.
    Souto, Victoria D. P.
    Souza, Richard D.
    Mendes, Luciano L.
    SENSORS, 2024, 24 (06)
  • [27] Evaluation of Phishing Techniques Based on Machine Learning
    Kunju, Merlinn, V
    Dathel, Esther
    Anthony, Heron Celestie
    Bhelwa, Sonali
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICCS), 2019, : 963 - 968
  • [28] A Survey of Machine Learning Based Database Techniques
    Li G.-L.
    Zhou X.-H.
    Sun J.
    Yu X.
    Yuan H.-T.
    Liu J.-B.
    Han Y.
    Jisuanji Xuebao/Chinese Journal of Computers, 2020, 43 (11): : 2019 - 2049
  • [29] Design powerful predictor for mRNA subcellular location prediction in Homo sapiens
    Zhang, Zhao-Yue
    Yang, Yu-He
    Ding, Hui
    Wang, Dong
    Chen, Wei
    Lin, Hao
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (01) : 526 - 535
  • [30] mRCat: A Novel CatBoost Predictor for the Binary Classification of mRNA Subcellular Localization by Fusing Large Language Model Representation and Sequence Features
    Wang, Xiao
    Yang, Lixiang
    Wang, Rong
    BIOMOLECULES, 2024, 14 (07)