RBRIdent: An algorithm for improved identification of RNA-binding residues in proteins from primary sequences

被引:11
作者
Xiong, Dapeng [1 ]
Zeng, Jianyang [2 ]
Gong, Haipeng [1 ]
机构
[1] Tsinghua Univ, Sch Life Sci, MOE Key Lab Bioinformat, Beijing 100084, Peoples R China
[2] Tsinghua Univ, Inst Interdisciplinary Informat Sci, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
RNA-binding residues; sequence-based prediction; machine learning; feature selection; FEATURE-SELECTION; INTERFACE RESIDUE; PREDICTION; SITES; CLASSIFICATION; DNA;
D O I
10.1002/prot.24806
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Rapid and correct identification of RNA-binding residues based on the protein primary sequences is of great importance. In most prevalent machine-learning-based identification methods; however, either some features are inefficiently represented, or the redundancy between features is not effectively removed. Both problems may weaken the performance of a classifier system and raise its computational complexity. Here, we addressed the above problems and developed a better classifier (RBRIdent) to identify the RNA-binding residues. In an independent benchmark test, RBRIdent achieved an accuracy of 76.79%, Matthews correlation coefficient of 0.3819 and F-measure of 75.58%, remarkably outperforming all prevalent methods. These results suggest the necessity of proper feature description and the essential role of feature selection in this project. All source data and codes are freely available at . Proteins 2015; 83:1068-1077. (c) 2015 Wiley Periodicals, Inc.
引用
收藏
页码:1068 / 1077
页数:10
相关论文
共 41 条
  • [21] GENETIC ALGORITHMS AS A STRATEGY FOR FEATURE-SELECTION
    LEARDI, R
    BOGGIA, R
    TERRILE, M
    [J]. JOURNAL OF CHEMOMETRICS, 1992, 6 (05) : 267 - 281
  • [22] Prediction of protein-protein binding site by using core interface residue and support vector machine
    Li, Nan
    Sun, Zhonghua
    Jiang, Fan
    [J]. BMC BIOINFORMATICS, 2008, 9 (1)
  • [23] Quantifying sequence and structural features of protein-RNA interactions
    Li, Songling
    Yamashita, Kazuo
    Amada, Karlou Mar
    Standley, Daron M.
    [J]. NUCLEIC ACIDS RESEARCH, 2014, 42 (15) : 10086 - 10098
  • [24] Toward integrating feature selection algorithms for classification and clustering
    Liu, H
    Yu, L
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (04) : 491 - 502
  • [25] Prediction of protein-RNA binding sites by a random forest method with combined features
    Liu, Zhi-Ping
    Wu, Ling-Yun
    Wang, Yong
    Zhang, Xiang-Sun
    Chen, Luonan
    [J]. BIOINFORMATICS, 2010, 26 (13) : 1616 - 1622
  • [26] RNA-binding proteins in human genetic disease
    Lukong, Kiven E.
    Chang, Kai-wei
    Khandjian, Edouard W.
    Richard, Stephane
    [J]. TRENDS IN GENETICS, 2008, 24 (08) : 416 - 425
  • [27] Ma X, 2010, PROTEIN-STRUCT FUNCT, V79, P1230
  • [28] Nelson D.L., 2004, Lehninger Principles of biochemistry, V4th, P75
  • [29] Pizzuti Clara, 2013, Pattern Recognition in Bioinformatics. 8th IAPR International Conference, PRIB 2013. Proceedings: LNCS 7986, P59, DOI 10.1007/978-3-642-39159-0_6
  • [30] Computational methods for prediction of protein-RNA interactions
    Puton, Tomasz
    Kozlowski, Lukasz
    Tuszynska, Irina
    Rother, Kristian
    Bujnicki, Janusz M.
    [J]. JOURNAL OF STRUCTURAL BIOLOGY, 2012, 179 (03) : 261 - 268