NRPreTo: A Machine Learning-Based Nuclear Receptor and Subfamily Prediction Tool

被引:1
作者
Madugula, Sita Sirisha [1 ]
Pandey, Suman [1 ]
Amalapurapu, Shreya [1 ,2 ]
Bozdag, Serdar [1 ,3 ,4 ]
机构
[1] Univ North Texas, Dept Comp Sci & Engn, Denton, TX 76203 USA
[2] Univ North Texas, Texas Acad Math & Sci, Denton, TX 76203 USA
[3] Univ North Texas, Dept Math, Denton, TX 76203 USA
[4] Univ North Texas, BioDiscovery Inst, Denton, TX 76203 USA
来源
ACS OMEGA | 2023年 / 8卷 / 23期
基金
美国国家卫生研究院;
关键词
AMINO-ACID-COMPOSITION; BINDING PROTEINS; IDENTIFICATION; INFORMATION; DATABASE; SYSTEM;
D O I
10.1021/acsomega.3c00286
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The nuclear receptor (NR) superfamily includes phylogeneticallyrelated ligand-activated proteins, which play a key role in variouscellular activities. NR proteins are subdivided into seven subfamiliesbased on their function, mechanism, and nature of the interactingligand. Developing robust tools to identify NR could give insightsinto their functional relationships and involvement in disease pathways.Existing NR prediction tools only use a few types of sequence-basedfeatures and are tested on relatively similar independent datasets;thus, they may suffer from overfitting when extended to new generaof sequences. To address this problem, we developed Nuclear ReceptorPrediction Tool (NRPreTo), a two-level NR prediction tool with a uniquetraining approach where in addition to the sequence-based featuresused by existing NR prediction tools, six additional feature groupsdepicting various physiochemical, structural, and evolutionary featuresof proteins were utilized. The first level of NRPreTo allows for thesuccessful prediction of a query protein as NR or non-NR and furthersubclassifies the protein into one of the seven NR subfamilies inthe second level. We developed Random Forest classifiers to test onbenchmark datasets, as well as the entire human protein datasets fromRefSeq and Human Protein Reference Database (HPRD). We observed thatusing additional feature groups improved the performance. We alsoobserved that NRPreTo achieved high performance on the external datasetsand predicted 59 novel NRs in the human proteome. The source codeof NRPreTo is publicly available at https://github.com/bozdaglab/NRPreTo.
引用
收藏
页码:20379 / 20388
页数:10
相关论文
共 53 条
  • [1] ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
  • [2] Nuclear hormone receptors and gene expression
    Aranda, A
    Pascual, A
    [J]. PHYSIOLOGICAL REVIEWS, 2001, 81 (03) : 1269 - 1304
  • [3] Auwerx J, 1999, CELL, V97, P161
  • [4] Classification of nuclear receptors based on amino acid composition and dipeptide composition
    Bhasin, M
    Raghava, GPS
    [J]. JOURNAL OF BIOLOGICAL CHEMISTRY, 2004, 279 (22) : 23262 - 23266
  • [5] Recognition of Mitochondrial Proteins in Plasmodium Based on the Tripeptide Composition
    Bian, Haodong
    Guo, Maozu
    Wang, Juan
    [J]. FRONTIERS IN CELL AND DEVELOPMENTAL BIOLOGY, 2020, 8
  • [6] iFeature: a Python']Python package and web server for features extraction and selection from protein and peptide sequences
    Chen, Zhen
    Zhao, Pei
    Li, Fuyi
    Leier, Andre
    Marquez-Lago, Tatiana T.
    Wang, Yanan
    Webb, Geoffrey I.
    Smith, A. Ian
    Daly, Roger J.
    Chou, Kuo-Chen
    Song, Jiangning
    [J]. BIOINFORMATICS, 2018, 34 (14) : 2499 - 2502
  • [7] Prediction of Ubiquitination Sites by Using the Composition of k-Spaced Amino Acid Pairs
    Chen, Zhen
    Chen, Yong-Zi
    Wang, Xiao-Feng
    Wang, Chuan
    Yan, Ren-Xiang
    Zhang, Ziding
    [J]. PLOS ONE, 2011, 6 (07):
  • [8] IDENTIFICATION OF A CONSERVED REGION REQUIRED FOR HORMONE DEPENDENT TRANSCRIPTIONAL ACTIVATION BY STEROID-HORMONE RECEPTORS
    DANIELIAN, PS
    WHITE, R
    LEES, JA
    PARKER, MG
    [J]. EMBO JOURNAL, 1992, 11 (03) : 1025 - 1033
  • [9] Identification of mycobacterial membrane proteins and their types using over-represented tripeptide compositions
    Ding, Chen
    Yuan, Lu-Feng
    Guo, Shou-Hui
    Lin, Hao
    Chen, Wei
    [J]. JOURNAL OF PROTEOMICS, 2012, 77 : 321 - 328
  • [10] Identification of mitochondrial proteins of malaria parasite using analysis of variance
    Ding, Hui
    Li, Dongmei
    [J]. AMINO ACIDS, 2015, 47 (02) : 329 - 333