A new computational strategy for identifying essential proteins based on network topological properties and biological information

被引：15

作者：

Qin, Chao ^{[1
]}

Sun, Yongqi ^{[1
]}

Dong, Yadong ^{[1
]}

机构：

[1] Beijing Jiaotong Univ, Sch Comp & Informat Technol, Beijing Key Lab Traff Data Anal & Min, Beijing, Peoples R China

来源：

PLOS ONE | 2017年 / 12卷 / 07期

基金：

中国国家自然科学基金;

关键词：

ESSENTIAL GENE IDENTIFICATION; SUBCELLULAR-LOCALIZATION; CENTRALITY; ORTHOLOGY; GENOME; INTERACTOME; INTEGRATION; PREDICTION; DATABASE;

D O I：

10.1371/journal.pone.0182031

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Essential proteins are the proteins that are indispensable to the survival and development of an organism. Deleting a single essential protein will cause lethality or infertility. Identifying and analysing essential proteins are key to understanding the molecular mechanisms of living cells. There are two types of methods for predicting essential proteins: experimental methods, which require considerable time and resources, and computational methods, which overcome the shortcomings of experimental methods. However, the prediction accuracy of computational methods for essential proteins requires further improvement. In this paper, we propose a new computational strategy named CoTB for identifying essential proteins based on a combination of topological properties, subcellular localization information and orthologous protein information. First, we introduce several topological properties of the protein-protein interaction (PPI) network. Second, we propose new methods for measuring orthologous information and subcellular localization and a new computational strategy that uses a random forest prediction model to obtain a probability score for the proteins being essential. Finally, we conduct experiments on four different Saccharomyces cerevisiae datasets. The experimental results demonstrate that our strategy for identifying essential proteins outperforms traditional computational methods and the most recently developed method, SON. In particular, our strategy improves the prediction accuracy to 89, 78, 79, and 85 percent on the YDIP, YMIPS, YMBD and YHQ datasets at the top 100 level, respectively.

引用

页数：24

共 42 条

[11] Hall M., 2009, SIGKDD EXPLORATIONS, V11, P10, DOI [DOI 10.1145/1656274.1656278, 10.1145/1656274.1656278]
[12] Essential gene identification and drug target prioritization in Aspergillus fumigatus
Hu, Wenqi
Sillaots, Susan
Lemieux, Sebastien
Davison, John
Kauffman, Sarah
Breton, Anouk
Linteau, Annie
Xin, Chunlin
Bowman, Joel
Becker, Jeff
Jiang, Bo
Roemer, Terry
[J]. PLOS PATHOGENS, 2007, 3 (03)
[13] Saccharomyces genome database
Issel-Tarver, L
Christie, KR
Dolinski, K
Andrada, R
Balakrishnan, R
Ball, CA
Binkley, G
Dong, S
Dwight, SS
Fisk, DG
Harris, M
Schroeder, M
Sethuraman, A
Tse, K
Weng, S
Botstein, D
Cherry, JM
[J]. GUIDE TO YEAST GENETICS AND MOLECULAR AND CELL BIOLOGY, PT B, 2002, 350 : 329 - 346
[14] Predicting essential proteins based on subcellular localization, orthology and PPI networks
Li, Gaoshi
Li, Min
Wang, Jianxin
Wu, Jingli
Wu, Fang-Xiang
Pan, Yi
[J]. BMC BIOINFORMATICS, 2016, 17
[15] United Complex Centrality for Identification of Essential Proteins from PPI Networks
Li, Min
Lu, Yu
Niu, Zhibei
Wu, Fang-Xiang
[J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2017, 14 (02) : 370 - 380
[16] A Reliable Neighbor-Based Method for Identifying Essential Proteins by Integrating Gene Expressions, Orthology, and Subcellular Localization Information
Li, Min
Niu, Zhibei
Chen, Xiaopei
Zhong, Ping
Wu, Fangxiang
Pan, Yi
[J]. TSINGHUA SCIENCE AND TECHNOLOGY, 2016, 21 (06) : 668 - 677
[17] Prioritizing Disease Genes by Using Search Engine Algorithm
Li, Min
Zheng, Ruiqing
Li, Qi
Wang, Jianxin
Wu, Fang-Xiang
Zhang, Zhuohua
[J]. CURRENT BIOINFORMATICS, 2016, 11 (02) : 195 - 202
[18] A Topology Potential-Based Method for Identifying Essential Proteins from PPI Networks
Li, Min
Lu, Yu
Wang, Jianxin
Wu, Fang-Xiang
Pan, Yi
[J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2015, 12 (02) : 372 - 383
[19] A new essential protein discovery method based on the integration of protein-protein interaction and gene expression data
Li, Min
Zhang, Hanhui
Wang, Jian-xin
Pan, Yi
[J]. BMC SYSTEMS BIOLOGY, 2012, 6
[20] A local average connectivity-based method for identifying essential proteins from the network level
Li, Min
Wang, Jianxin
Chen, Xiang
Wang, Huan
Pan, Yi
[J]. COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2011, 35 (03) : 143 - 150

← 1 2 3 4 5 →