A Feature Selection Method for Prediction Essential Protein

被引:31
作者
Zhong, Jiancheng [1 ,2 ]
Wang, Jianxin [1 ]
Peng, Wei [3 ]
Zhang, Zhen [1 ]
Li, Min [1 ]
机构
[1] Cent S Univ, Sch Informat Sci & Engn, Changsha 410083, Peoples R China
[2] Hunan Normal Univ, Coll Polytech, Changsha 410083, Peoples R China
[3] Kunming Univ Sci & Technol, Ctr Comp, Kunming 650093, Peoples R China
基金
中国国家自然科学基金;
关键词
essential protein; feature selection; Protein-Protein Interaction (PPI); machine learning; centrality algorithm; ESSENTIAL GENES; SACCHAROMYCES-CEREVISIAE; IDENTIFICATION; CENTRALITY; NETWORKS; LOCALIZATION; INTEGRATION; ORTHOLOGY; IDENTIFY; DATABASE;
D O I
10.1109/TST.2015.7297748
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Essential proteins are vital to the survival of a cell. There are various features related to the essentiality of proteins, such as biological and topological features. Many computational methods have been developed to identify essential proteins by using these features. However, it is still a big challenge to design an effective method that is able to select suitable features and integrate them to predict essential proteins. In this work, we first collect 26 features, and use SVM-RFE to select some of them to create a feature space for predicting essential proteins, and then remove the features that share the biological meaning with other features in the feature space according to their Pearson Correlation Coefficients (PCC). The experiments are carried out on S. cerevisiae data. Six features are determined as the best subset of features. To assess the prediction performance of our method, we further compare it with some machine learning methods, such as SVM, Naive Bayes, Bayes Network, and NBTree when inputting the different number of features. The results show that those methods using the 6 features outperform that using other features, which confirms the effectiveness of our feature selection method for essential protein prediction.
引用
收藏
页码:491 / 499
页数:9
相关论文
共 39 条
  • [1] Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information
    Acencio, Marcio L.
    Lemke, Ney
    [J]. BMC BIOINFORMATICS, 2009, 10 : 290
  • [2] GeStoDifferent: a Cytoscape plugin for the generation and the identification of gene regulatory networks describing a stochastic cell differentiation process
    Antoniotti, Marco
    Bader, Gary D.
    Caravagna, Giulio
    Crippa, Silvia
    Graudenzi, Alex
    Mauri, Giancarlo
    [J]. BIOINFORMATICS, 2013, 29 (04) : 513 - 514
  • [3] BONACICH P, 1987, AM J SOCIOL, V92, P1170, DOI 10.1086/228631
  • [4] SGD:: Saccharomyces Genome Database
    Cherry, JM
    Adler, C
    Ball, C
    Chervitz, SA
    Dwight, SS
    Hester, ET
    Jia, YK
    Juvik, G
    Roe, T
    Schroeder, M
    Weng, SA
    Botstein, D
    [J]. NUCLEIC ACIDS RESEARCH, 1998, 26 (01) : 73 - 79
  • [5] Genome-wide screening for gene function using RNAi in mammalian cells
    Cullen, LM
    Arndt, GM
    [J]. IMMUNOLOGY AND CELL BIOLOGY, 2005, 83 (03) : 217 - 223
  • [6] How to identify essential genes from molecular networks?
    del Rio, Gabriel
    Koschuetzki, Dirk
    Coello, Gerardo
    [J]. BMC SYSTEMS BIOLOGY, 2009, 3 : 102
  • [7] Investigating the predictability of essential genes across distantly related organisms using an integrative approach
    Deng, Jingyuan
    Deng, Lei
    Su, Shengchang
    Zhang, Minlu
    Lin, Xiaodong
    Wei, Lan
    Minai, Ali A.
    Hassett, Daniel J.
    Lu, Long J.
    [J]. NUCLEIC ACIDS RESEARCH, 2011, 39 (03) : 795 - 807
  • [8] Subgraph centrality in complex networks -: art. no. 056103
    Estrada, E
    Rodríguez-Velázquez, JA
    [J]. PHYSICAL REVIEW E, 2005, 71 (05)
  • [9] SET OF MEASURES OF CENTRALITY BASED ON BETWEENNESS
    FREEMAN, LC
    [J]. SOCIOMETRY, 1977, 40 (01): : 35 - 41
  • [10] Functional profiling of the Saccharomyces cerevisiae genome
    Giaever, G
    Chu, AM
    Ni, L
    Connelly, C
    Riles, L
    Véronneau, S
    Dow, S
    Lucau-Danila, A
    Anderson, K
    André, B
    Arkin, AP
    Astromoff, A
    El Bakkoury, M
    Bangham, R
    Benito, R
    Brachat, S
    Campanaro, S
    Curtiss, M
    Davis, K
    Deutschbauer, A
    Entian, KD
    Flaherty, P
    Foury, F
    Garfinkel, DJ
    Gerstein, M
    Gotte, D
    Güldener, U
    Hegemann, JH
    Hempel, S
    Herman, Z
    Jaramillo, DF
    Kelly, DE
    Kelly, SL
    Kötter, P
    LaBonte, D
    Lamb, DC
    Lan, N
    Liang, H
    Liao, H
    Liu, L
    Luo, CY
    Lussier, M
    Mao, R
    Menard, P
    Ooi, SL
    Revuelta, JL
    Roberts, CJ
    Rose, M
    Ross-Macdonald, P
    Scherens, B
    [J]. NATURE, 2002, 418 (6896) : 387 - 391