An efficient strategy for identifying essential proteins based on homology, subcellular location and protein-protein interaction information

被引:1
作者
Zhang, Zhihong [1 ]
Luo, Yingchun [2 ]
Jiang, Meiping [2 ]
Wu, Dongjie [3 ]
Zhang, Wang [4 ]
Yan, Wei [1 ]
Zhao, Bihai [1 ]
机构
[1] Changsha Univ, Coll Comp Engn & Appl Math, Changsha 410022, Hunan, Peoples R China
[2] Hunan Prov Maternal & Child Hlth Care Hosp, Dept Ultrasound, Changsha 410008, Hunan, Peoples R China
[3] Monash Univ, Dept Banking & Finance, Clayton, Vic 3168, Australia
[4] Jinan Univ, Dept Optoelect Engn, Guangzhou 510632, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
essential protein; protein-protein interaction; non-negative matrix symmetric tri-factorization; multiple biological information; subcellular location information; homology information; COMPREHENSIVE RESOURCE; ESSENTIAL GENES; DATABASE; ANNOTATION; ORTHOLOGY; GENOMICS;
D O I
10.3934/mbe.2022296
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
High throughput biological experiments are expensive and time consuming. For the past few years, many computational methods based on biological information have been proposed and widely used to understand the biological background. However, the processing of biological information data inevitably produces false positive and false negative data, such as the noise in the Protein-Protein Interaction (PPI) networks and the noise generated by the integration of a variety of biological information. How to solve these noise problems is the key role in essential protein predictions. An Identifying Essential Proteins model based on non-negative Matrix Symmetric tri-Factorization and multiple biological information (IEPMSF) is proposed in this paper, which utilizes only the PPI network proteins common neighbor characters to develop a weighted network, and uses the non-negative matrix symmetric tri-factorization method to find more potential interactions between proteins in the network so as to optimize the weighted network. Then, using the subcellular location and lineal homology information, the starting score of proteins is determined, and the random walk algorithm with restart mode is applied to the optimized network to mark and rank each protein. We tested the suggested forecasting model against current representative approaches using a public database. Experiment shows high efficiency of new method in essential proteins identification. The effectiveness of this method shows that it can dramatically solve the noise problems that existing in the multi-source biological information itself and cased by integrating them.
引用
收藏
页码:6331 / 6343
页数:13
相关论文
共 33 条
  • [11] High-betweenness proteins in the yeast protein interaction network
    Joy, MP
    Brock, A
    Ingber, DE
    Huang, S
    [J]. JOURNAL OF BIOMEDICINE AND BIOTECHNOLOGY, 2005, (02): : 96 - 103
  • [12] Enriching Non-negative Matrix Factorization with Contextual Embeddings for Recommender Systems
    Khan, Zafran
    Iltaf, Naima
    Afzal, Hammad
    Abbas, Haider
    [J]. NEUROCOMPUTING, 2020, 380 : 246 - 258
  • [13] Predicting essential proteins based on subcellular localization, orthology and PPI networks
    Li, Gaoshi
    Li, Min
    Wang, Jianxin
    Wu, Jingli
    Wu, Fang-Xiang
    Pan, Yi
    [J]. BMC BIOINFORMATICS, 2016, 17
  • [14] Effective identification of essential proteins based on priori knowledge, network topology and gene expressions
    Li, Min
    Zheng, Ruiqing
    Zhang, Hanhui
    Wang, Jianxin
    Pan, Yi
    [J]. METHODS, 2014, 67 (03) : 325 - 333
  • [15] A new essential protein discovery method based on the integration of protein-protein interaction and gene expression data
    Li, Min
    Zhang, Hanhui
    Wang, Jian-xin
    Pan, Yi
    [J]. BMC SYSTEMS BIOLOGY, 2012, 6
  • [16] UniProt Knowledgebase: a hub of integrated protein data
    Magrane, Michele
    [J]. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2011,
  • [17] SNFM: A semi-supervised NMF algorithm for detecting biological functional modules
    Man, Yutong
    Liu, Guangming
    Yang, Kuo
    Zhou, Xuezhong
    [J]. MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2019, 16 (04) : 1933 - 1948
  • [18] FlyBase 101-the basics of navigating FlyBase
    McQuilton, Peter
    St Pierre, Susan E.
    Thurmond, Jim
    [J]. NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) : D706 - D714
  • [19] MIPS:: analysis and annotation of proteins from whole genomes in 2005
    Mewes, H. W.
    Frishman, D.
    Mayer, K. F. X.
    Muensterkoetter, M.
    Noubibou, O.
    Pagel, P.
    Rattei, T.
    Oesterheld, M.
    Ruepp, A.
    Stuempflen, V.
    [J]. NUCLEIC ACIDS RESEARCH, 2006, 34 : D169 - D172
  • [20] InParanoid 7: new algorithms and tools for eukaryotic orthology analysis
    Ostlund, Gabriel
    Schmitt, Thomas
    Forslund, Kristoffer
    Kostler, Tina
    Messina, David N.
    Roopra, Sanjit
    Frings, Oliver
    Sonnhammer, Erik L. L.
    [J]. NUCLEIC ACIDS RESEARCH, 2010, 38 : D196 - D203