An efficient strategy for identifying essential proteins based on homology, subcellular location and protein-protein interaction information

被引:1
作者
Zhang, Zhihong [1 ]
Luo, Yingchun [2 ]
Jiang, Meiping [2 ]
Wu, Dongjie [3 ]
Zhang, Wang [4 ]
Yan, Wei [1 ]
Zhao, Bihai [1 ]
机构
[1] Changsha Univ, Coll Comp Engn & Appl Math, Changsha 410022, Hunan, Peoples R China
[2] Hunan Prov Maternal & Child Hlth Care Hosp, Dept Ultrasound, Changsha 410008, Hunan, Peoples R China
[3] Monash Univ, Dept Banking & Finance, Clayton, Vic 3168, Australia
[4] Jinan Univ, Dept Optoelect Engn, Guangzhou 510632, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
essential protein; protein-protein interaction; non-negative matrix symmetric tri-factorization; multiple biological information; subcellular location information; homology information; COMPREHENSIVE RESOURCE; ESSENTIAL GENES; DATABASE; ANNOTATION; ORTHOLOGY; GENOMICS;
D O I
10.3934/mbe.2022296
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
High throughput biological experiments are expensive and time consuming. For the past few years, many computational methods based on biological information have been proposed and widely used to understand the biological background. However, the processing of biological information data inevitably produces false positive and false negative data, such as the noise in the Protein-Protein Interaction (PPI) networks and the noise generated by the integration of a variety of biological information. How to solve these noise problems is the key role in essential protein predictions. An Identifying Essential Proteins model based on non-negative Matrix Symmetric tri-Factorization and multiple biological information (IEPMSF) is proposed in this paper, which utilizes only the PPI network proteins common neighbor characters to develop a weighted network, and uses the non-negative matrix symmetric tri-factorization method to find more potential interactions between proteins in the network so as to optimize the weighted network. Then, using the subcellular location and lineal homology information, the starting score of proteins is determined, and the random walk algorithm with restart mode is applied to the optimized network to mark and rank each protein. We tested the suggested forecasting model against current representative approaches using a public database. Experiment shows high efficiency of new method in essential proteins identification. The effectiveness of this method shows that it can dramatically solve the noise problems that existing in the multi-source biological information itself and cased by integrating them.
引用
收藏
页码:6331 / 6343
页数:13
相关论文
共 33 条
  • [1] COMPARTMENTS: unification and visualization of protein subcellular localization evidence
    Binder, Janos X.
    Pletscher-Frankild, Sune
    Tsafou, Kalliopi
    Stolte, Christian
    O'Donoghue, Sean I.
    Schneider, Reinhard
    Jensen, Lars Juhl
    [J]. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2014,
  • [2] Language, research and nursing practice
    Björnsdottir, K
    [J]. JOURNAL OF ADVANCED NURSING, 2001, 33 (02) : 159 - 166
  • [3] SGD:: Saccharomyces Genome Database
    Cherry, JM
    Adler, C
    Ball, C
    Chervitz, SA
    Dwight, SS
    Hester, ET
    Jia, YK
    Juvik, G
    Roe, T
    Schroeder, M
    Weng, SA
    Botstein, D
    [J]. NUCLEIC ACIDS RESEARCH, 1998, 26 (01) : 73 - 79
  • [4] Ding C., 2006, P 12 ACM SIGKDD INT, DOI DOI 10.1145/1150402.1150420
  • [5] The Mouse Genome Database (MGD): comprehensive resource for genetics and genomics of the laboratory mouse
    Eppig, Janan T.
    Blake, Judith A.
    Bult, Carol J.
    Kadin, James A.
    Richardson, Joel E.
    [J]. NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) : D881 - D886
  • [6] Subgraph centrality in complex networks -: art. no. 056103
    Estrada, E
    Rodríguez-Velázquez, JA
    [J]. PHYSICAL REVIEW E, 2005, 71 (05)
  • [7] Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks
    Hahn, MW
    Kern, AD
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2005, 22 (04) : 803 - 806
  • [8] WormBase: a comprehensive resource for nematode research
    Harris, Todd W.
    Antoshechkin, Igor
    Bieri, Tamberlyn
    Blasiar, Darin
    Chan, Juancarlos
    Chen, Wen J.
    De La Cruz, Norie
    Davis, Paul
    Duesbury, Margaret
    Fang, Ruihua
    Fernandes, Jolene
    Han, Michael
    Kishore, Ranjana
    Lee, Raymond
    Mueller, Hans-Michael
    Nakamura, Cecilia
    Ozersky, Philip
    Petcherski, Andrei
    Rangarajan, Arun
    Rogers, Anthony
    Schindelman, Gary
    Schwarz, Erich M.
    Tuli, Mary Ann
    Van Auken, Kimberly
    Wang, Daniel
    Wang, Xiaodong
    Williams, Gary
    Yook, Karen
    Durbin, Richard
    Stein, Lincoln D.
    Spieth, John
    Sternberg, Paul W.
    [J]. NUCLEIC ACIDS RESEARCH, 2010, 38 : D463 - D467
  • [9] A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality
    Hart, G. Traver
    Lee, Insuk
    Marcotte, Edward R.
    [J]. BMC BIOINFORMATICS, 2007, 8 (1)
  • [10] Text mining using nonnegative matrix factorization and latent semantic analysis
    Hassani, Ali
    Iranmanesh, Amir
    Mansouri, Najme
    [J]. NEURAL COMPUTING & APPLICATIONS, 2021, 33 (20) : 13745 - 13766