MSLP: mRNA subcellular localization predictor based on machine learning techniques

被引:9
|
作者
Musleh, Saleh [1 ]
Islam, Mohammad Tariqul [2 ]
Qureshi, Rizwan [1 ]
Alajez, Nihad [3 ,4 ]
Alam, Tanvir [1 ]
机构
[1] Hamad Bin Khalifa Univ, Coll Sci & Engn, Doha, Qatar
[2] Southern Connecticut State Univ, Comp Sci Dept, New Haven, CT USA
[3] Hamad Bin Khalifa Univ, Qatar Biomed Res Inst QBRI, Translat Canc & Immun Ctr TC, Doha, Qatar
[4] Hamad Bin Khalifa Univ, Coll Hlth & Life Sci, Doha, Qatar
关键词
RNA; mRNA; Machine learning; Sequence analysis; Localization prediction; Subcellular localization; NERVOUS-SYSTEM; RNALOCATE; SEQUENCES; RESOURCE;
D O I
10.1186/s12859-023-05232-0
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Subcellular localization of messenger RNA (mRNAs) plays a pivotal role in the regulation of gene expression, cell migration as well as in cellular adaptation. Experiment techniques for pinpointing the subcellular localization of mRNAs are laborious, time-consuming and expensive. Therefore, in silico approaches for this purpose are attaining great attention in the RNA community. Methods: In this article, we propose MSLP, a machine learning-based method to predict the subcellular localization of mRNA. We propose a novel combination of four types of features representing k-mer, pseudo k-tuple nucleotide composition (PseKNC), physicochemical properties of nucleotides, and 3D representation of sequences based on Z-curve transformation to feed into machine learning algorithm to predict the subcellular localization of mRNAs. Results: Considering the combination of the above-mentioned features, ennsemble-based models achieved state-of-the-art results in mRNA subcellular localization prediction tasks for multiple benchmark datasets. We evaluated the performance of our method in ten subcellular locations, covering cytoplasm, nucleus, endoplasmic reticulum (ER), extracellular region (ExR), mitochondria, cytosol, pseudopodium, posterior, exosome, and the ribosome. Ablation study highlighted k-mer and PseKNC to be more dominant than other features for predicting cytoplasm, nucleus, and ER localizations. On the other hand, physicochemical properties and Z-curve based features contributed the most to ExR and mitochondria detection. SHAP-based analysis revealed the relative importance of features to provide better insights into the proposed approach. Availability: We have implemented a Docker container and API for end users to run their sequences on our model. Datasets, the code of API and the Docker are shared for the community in GitHub at: https://github.com/smusleh/MSLP.
引用
收藏
页数:23
相关论文
共 50 条
  • [31] LightGBM-LncLoc: A LightGBM-Based Computational Predictor for Recognizing Long Non-Coding RNA Subcellular Localization
    Lyu, Jianyi
    Zheng, Peijie
    Qi, Yue
    Huang, Guohua
    MATHEMATICS, 2023, 11 (03)
  • [32] Feature Extraction Techniques for Protein Subcellular Localization Prediction
    Gao, Qing-Bin
    Jin, Zhi-Chao
    Wu, Cheng
    Sun, Ya-Lin
    He, Jia
    He, Xiang
    CURRENT BIOINFORMATICS, 2009, 4 (02) : 120 - 128
  • [33] A Review for Artificial Intelligence Based Protein Subcellular Localization
    Xiao, Hanyu
    Zou, Yijin
    Wang, Jieqiong
    Wan, Shibiao
    BIOMOLECULES, 2024, 14 (04)
  • [34] AmpClass: an Antimicrobial Peptide Predictor Based on Supervised Machine Learning
    Mera-Banguero, Carlos
    Orduz, Sergio
    Cardona, Pablo
    Orrego, Andres
    Munoz-Perez, Jorge
    Branch-Bedoya, John w.
    ANAIS DA ACADEMIA BRASILEIRA DE CIENCIAS, 2024, 96 (04):
  • [35] Compressed Learning and Its Applications to Subcellular Localization
    Zheng, Zhong-Long
    Guo, Li
    Jia, Jiong
    Xie, Chen-Mao
    Zeng, Wen-Cai
    Yang, Jie
    PROTEIN AND PEPTIDE LETTERS, 2011, 18 (09) : 925 - 934
  • [36] Associated Machine Learning Techniques based On Diabetes Based Predictions
    Jacob, Shon Mathew
    Raimond, Kumudha
    Kanmani, Deepa
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICCS), 2019, : 1445 - 1450
  • [37] Faulted Line Identification and Localization in Power System using Machine Learning Techniques
    Zainab, Ameema
    Refaat, Shady S.
    Syed, Dabeeruddin
    Ghrayeh, Ali
    Abu-Rub, Haitham
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 2975 - 2981
  • [38] A Survey on Machine Learning Based Requirement Prioritization Techniques
    Qayyum, Shamaila
    Qureshi, Ahsan
    2018 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND INTELLIGENT SYSTEMS (CIIS 2018), 2018, : 51 - 55
  • [39] Students Performance Analysis Based on Machine Learning Techniques
    Rivas, Alberto
    Fraile, Jesus M.
    Chamoso, Pablo
    Gonzalez-Briones, Alfonso
    Rodriguez, Sara
    Corchado, Juan M.
    LEARNING TECHNOLOGY FOR EDUCATION CHALLENGES, LTEC 2019, 2019, 1011 : 428 - 438
  • [40] Predict the Price of Gold Based on Machine Learning Techniques
    Zhu, Han-chao
    Wang, Dong
    INTERNATIONAL CONFERENCE ON MATHEMATICS, MODELLING AND SIMULATION TECHNOLOGIES AND APPLICATIONS (MMSTA 2017), 2017, 215 : 615 - 622