IDRBP-PPCT: Identifying Nucleic Acid-Binding Proteins Based on Position-Specific Score Matrix and Position-Specific Frequency Matrix Cross Transformation

被引:27
作者
Wang, Ning [1 ]
Zhang, Jun [2 ]
Liu, Bin [3 ,4 ]
机构
[1] Beiging Inst Technol, Sch Comp Sci & Technol, Beijing 100081, Peoples R China
[2] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518055, Guangdong, Peoples R China
[3] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing 100081, Peoples R China
[4] Beijing Inst Technol, Adv Res Inst Multidisciplinary Sci, Beijing 100081, Peoples R China
基金
中国国家自然科学基金; 北京市自然科学基金; 国家重点研发计划;
关键词
Amino acids; Feature extraction; Protein sequence; Benchmark testing; DNA; RNA; Radio frequency; Nucleic acid-binding proteins identification; protein representation; random forest; PSSM and PSFM cross transformation; PSI-BLAST; RNA; PREDICTION; RESIDUES; KERNELS; DNA;
D O I
10.1109/TCBB.2021.3069263
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs) are two important nucleic acid-binding proteins (NABPs), which play important roles in biological processes such as replication, translation and transcription of genetic material. Some proteins (DRBPs) bind to both DNA and RNA, also play a key role in gene expression. Identification of DBPs, RBPs and DRBPs is important to study protein-nucleic acid interactions. Computational methods are increasingly being proposed to automatically identify DNA- or RNA-binding proteins based only on protein sequences. One challenge is to design an effective protein representation method to convert protein sequences into fixed-dimension feature vectors. In this study, we proposed a novel protein representation method called Position-Specific Scoring Matrix (PSSM) and Position-Specific Frequency Matrix (PSFM) Cross Transformation (PPCT) to represent protein sequences. This method contains the evolutionary information in PSSM and PSFM, and their correlations. A new computational predictor called IDRBP-PPCT was proposed by combining PPCT and the two-layer framework based on the random forest algorithm to identify DBPs, RBPs and DRBPs. The experimental results on the independent dataset and the tomato genome proved the effectiveness of the proposed method. A user-friendly web-server of IDRBP-PPCT was constructed, which is freely available at http://bliulab.net/IDRBP-PPCT.
引用
收藏
页码:2284 / 2293
页数:10
相关论文
共 53 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] Asghari Mehdi Poursheikhali, 2019, Avicenna Journal of Medical Biotechnology, V11, P104
  • [3] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [4] TriPepSVM: de novo prediction of RNA-binding proteins based on short amino acid motifs
    Bressin, Annkatrin
    Schulte-Sasse, Roman
    Figini, Davide
    Urdaneta, Erika C.
    Beckmann, Benedikt M.
    Marsico, Annalisa
    [J]. NUCLEIC ACIDS RESEARCH, 2019, 47 (09) : 4406 - 4417
  • [5] Some remarks on protein attribute prediction and pseudo amino acid composition
    Chou, Kuo-Chen
    [J]. JOURNAL OF THEORETICAL BIOLOGY, 2011, 273 (01) : 236 - 247
  • [6] A computational platform to identify origins of replication sites in eukaryotes
    Dao, Fu-Ying
    Lv, Hao
    Zulfiqar, Hasan
    Yang, Hui
    Su, Wei
    Gao, Hui
    Ding, Hui
    Lin, Hao
    [J]. BRIEFINGS IN BIOINFORMATICS, 2021, 22 (02) : 1940 - 1950
  • [7] Dong QW, 2015, IEEE INT C BIOINFORM, P470, DOI 10.1109/BIBM.2015.7359730
  • [8] Unravelling the dynamics of RNA degradation by ribonuclease II and its RNA-bound complex
    Frazao, Carlos
    McVey, Colin E.
    Amblar, Monica
    Barbas, Ana
    Vonrhein, Clemens
    Arraiano, Cecilia M.
    Carrondo, Maria A.
    [J]. NATURE, 2006, 443 (7107) : 110 - 114
  • [9] Prediction of Protein Folds: Extraction of New Features, Dimensionality Reduction, and Fusion of Heterogeneous Classifiers
    Ghanty, Pradip
    Pal, Nikhil R.
    [J]. IEEE TRANSACTIONS ON NANOBIOSCIENCE, 2009, 8 (01) : 100 - 110
  • [10] AMINO-ACID SUBSTITUTION MATRICES FROM PROTEIN BLOCKS
    HENIKOFF, S
    HENIKOFF, JG
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1992, 89 (22) : 10915 - 10919