RBRIdent: An algorithm for improved identification of RNA-binding residues in proteins from primary sequences

被引:11
作者
Xiong, Dapeng [1 ]
Zeng, Jianyang [2 ]
Gong, Haipeng [1 ]
机构
[1] Tsinghua Univ, Sch Life Sci, MOE Key Lab Bioinformat, Beijing 100084, Peoples R China
[2] Tsinghua Univ, Inst Interdisciplinary Informat Sci, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
RNA-binding residues; sequence-based prediction; machine learning; feature selection; FEATURE-SELECTION; INTERFACE RESIDUE; PREDICTION; SITES; CLASSIFICATION; DNA;
D O I
10.1002/prot.24806
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Rapid and correct identification of RNA-binding residues based on the protein primary sequences is of great importance. In most prevalent machine-learning-based identification methods; however, either some features are inefficiently represented, or the redundancy between features is not effectively removed. Both problems may weaken the performance of a classifier system and raise its computational complexity. Here, we addressed the above problems and developed a better classifier (RBRIdent) to identify the RNA-binding residues. In an independent benchmark test, RBRIdent achieved an accuracy of 76.79%, Matthews correlation coefficient of 0.3819 and F-measure of 75.58%, remarkably outperforming all prevalent methods. These results suggest the necessity of proper feature description and the essential role of feature selection in this project. All source data and codes are freely available at . Proteins 2015; 83:1068-1077. (c) 2015 Wiley Periodicals, Inc.
引用
收藏
页码:1068 / 1077
页数:10
相关论文
共 41 条
  • [1] Structure-based analysis of Protein-RNA interactions using the program ENTANGLE
    Allers, J
    Shamoo, Y
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2001, 311 (01) : 75 - 86
  • [2] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [3] [Anonymous], 2006, SUGAR CROPS CHINA
  • [4] The Protein Data Bank
    Berman, HM
    Westbrook, J
    Feng, Z
    Gilliland, G
    Bhat, TN
    Weissig, H
    Shindyalov, IN
    Bourne, PE
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242
  • [5] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [6] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [7] Protein families and RNA recognition
    Chen, Y
    Varani, G
    [J]. FEBS JOURNAL, 2005, 272 (09) : 2088 - 2097
  • [8] Predicting RNA-binding sites of proteins using support vector machines and evolutionary information
    Cheng, Cheng-Wei
    Su, Emily Chia-Yu
    Hwang, Jenn-Kang
    Sung, Ting-Yi
    Hsu, Wen-Lian
    [J]. BMC BIOINFORMATICS, 2008, 9
  • [9] RNA and Disease
    Cooper, Thomas A.
    Wan, Lili
    Dreyfuss, Gideon
    [J]. CELL, 2009, 136 (04) : 777 - 793
  • [10] A universal mode of helix packing in RNA
    Doherty, EA
    Batey, RT
    Masquida, B
    Doudna, JA
    [J]. NATURE STRUCTURAL BIOLOGY, 2001, 8 (04) : 339 - 343