A new computational strategy for identifying essential proteins based on network topological properties and biological information

被引:15
作者
Qin, Chao [1 ]
Sun, Yongqi [1 ]
Dong, Yadong [1 ]
机构
[1] Beijing Jiaotong Univ, Sch Comp & Informat Technol, Beijing Key Lab Traff Data Anal & Min, Beijing, Peoples R China
来源
PLOS ONE | 2017年 / 12卷 / 07期
基金
中国国家自然科学基金;
关键词
ESSENTIAL GENE IDENTIFICATION; SUBCELLULAR-LOCALIZATION; CENTRALITY; ORTHOLOGY; GENOME; INTERACTOME; INTEGRATION; PREDICTION; DATABASE;
D O I
10.1371/journal.pone.0182031
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Essential proteins are the proteins that are indispensable to the survival and development of an organism. Deleting a single essential protein will cause lethality or infertility. Identifying and analysing essential proteins are key to understanding the molecular mechanisms of living cells. There are two types of methods for predicting essential proteins: experimental methods, which require considerable time and resources, and computational methods, which overcome the shortcomings of experimental methods. However, the prediction accuracy of computational methods for essential proteins requires further improvement. In this paper, we propose a new computational strategy named CoTB for identifying essential proteins based on a combination of topological properties, subcellular localization information and orthologous protein information. First, we introduce several topological properties of the protein-protein interaction (PPI) network. Second, we propose new methods for measuring orthologous information and subcellular localization and a new computational strategy that uses a random forest prediction model to obtain a probability score for the proteins being essential. Finally, we conduct experiments on four different Saccharomyces cerevisiae datasets. The experimental results demonstrate that our strategy for identifying essential proteins outperforms traditional computational methods and the most recently developed method, SON. In particular, our strategy improves the prediction accuracy to 89, 78, 79, and 85 percent on the YDIP, YMIPS, YMBD and YHQ datasets at the top 100 level, respectively.
引用
收藏
页数:24
相关论文
共 42 条
  • [1] Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information
    Acencio, Marcio L.
    Lemke, Ney
    [J]. BMC BIOINFORMATICS, 2009, 10 : 290
  • [2] COMPARTMENTS: unification and visualization of protein subcellular localization evidence
    Binder, Janos X.
    Pletscher-Frankild, Sune
    Tsafou, Kalliopi
    Stolte, Christian
    O'Donoghue, Sean I.
    Schneider, Reinhard
    Jensen, Lars Juhl
    [J]. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2014,
  • [3] BONACICH P, 1987, AM J SOCIOL, V92, P1170, DOI 10.1086/228631
  • [4] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [5] Genome-wide screening for gene function using RNAi in mammalian cells
    Cullen, LM
    Arndt, GM
    [J]. IMMUNOLOGY AND CELL BIOLOGY, 2005, 83 (03) : 217 - 223
  • [6] Subgraph centrality in complex networks -: art. no. 056103
    Estrada, E
    Rodríguez-Velázquez, JA
    [J]. PHYSICAL REVIEW E, 2005, 71 (05)
  • [7] SET OF MEASURES OF CENTRALITY BASED ON BETWEENNESS
    FREEMAN, LC
    [J]. SOCIOMETRY, 1977, 40 (01): : 35 - 41
  • [8] Bootstrapping the Interactome: Unsupervised Identification of Protein Complexes in Yeast
    Friedel, Caroline C.
    Krumsiek, Jan
    Zimmer, Ralf
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2009, 16 (08) : 971 - 987
  • [9] Structural and functional properties of genes involved in human cancer
    Furney, SJ
    Higgins, DG
    Ouzounis, CA
    López-Bigas, N
    [J]. BMC GENOMICS, 2006, 7 (1)
  • [10] Functional profiling of the Saccharomyces cerevisiae genome
    Giaever, G
    Chu, AM
    Ni, L
    Connelly, C
    Riles, L
    Véronneau, S
    Dow, S
    Lucau-Danila, A
    Anderson, K
    André, B
    Arkin, AP
    Astromoff, A
    El Bakkoury, M
    Bangham, R
    Benito, R
    Brachat, S
    Campanaro, S
    Curtiss, M
    Davis, K
    Deutschbauer, A
    Entian, KD
    Flaherty, P
    Foury, F
    Garfinkel, DJ
    Gerstein, M
    Gotte, D
    Güldener, U
    Hegemann, JH
    Hempel, S
    Herman, Z
    Jaramillo, DF
    Kelly, DE
    Kelly, SL
    Kötter, P
    LaBonte, D
    Lamb, DC
    Lan, N
    Liang, H
    Liao, H
    Liu, L
    Luo, CY
    Lussier, M
    Mao, R
    Menard, P
    Ooi, SL
    Revuelta, JL
    Roberts, CJ
    Rose, M
    Ross-Macdonald, P
    Scherens, B
    [J]. NATURE, 2002, 418 (6896) : 387 - 391