A deep learning framework for identifying essential proteins based on multiple biological information

被引:16
作者
Yue, Yi [1 ,2 ,3 ,4 ]
Ye, Chen [1 ,2 ]
Peng, Pei-Yun [1 ,2 ]
Zhai, Hui-Xin [1 ,2 ]
Ahmad, Iftikhar [1 ,2 ]
Xia, Chuan [1 ,2 ]
Wu, Yun-Zhi [1 ,2 ,4 ]
Zhang, You-Hua [1 ,2 ,3 ]
机构
[1] Anhui Agr Univ, Anhui Prov Engn Lab Beidou Precis Agr Informat, Hefei 230036, Peoples R China
[2] Anhui Agr Univ, Sch Informat & Comp, Hefei 230036, Peoples R China
[3] Anhui Agr Univ, Sch Life Sci, Hefei 230036, Peoples R China
[4] Anhui Agr Univ, State Key Lab Tea Plant Biol & Utilizat, Hefei 230036, Peoples R China
关键词
Essential protein; Deep learning; Protein-protein interaction network; Subcellular localization; Gene expression; CENTRALITY;
D O I
10.1186/s12859-022-04868-8
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background Essential Proteins are demonstrated to exert vital functions on cellular processes and are indispensable for the survival and reproduction of the organism. Traditional centrality methods perform poorly on complex protein-protein interaction (PPI) networks. Machine learning approaches based on high-throughput data lack the exploitation of the temporal and spatial dimensions of biological information. Results We put forward a deep learning framework to predict essential proteins by integrating features obtained from the PPI network, subcellular localization, and gene expression profiles. In our model, the node2vec method is applied to learn continuous feature representations for proteins in the PPI network, which capture the diversity of connectivity patterns in the network. The concept of depthwise separable convolution is employed on gene expression profiles to extract properties and observe the trends of gene expression over time under different experimental conditions. Subcellular localization information is mapped into a long one-dimensional vector to capture its characteristics. Additionally, we use a sampling method to mitigate the impact of imbalanced learning when training the model. With experiments carried out on the data of Saccharomyces cerevisiae, results show that our model outperforms traditional centrality methods and machine learning methods. Likewise, the comparative experiments have manifested that our process of various biological information is preferable. Conclusions Our proposed deep learning framework effectively identifies essential proteins by integrating multiple biological data, proving a broader selection of subcellular localization information significantly improves the results of prediction and depthwise separable convolution implemented on gene expression profiles enhances the performance.
引用
收藏
页数:27
相关论文
共 43 条
[1]  
[Anonymous], 2015, ARXIV160304467V2
[2]   Permutation tests for classification [J].
Golland, P ;
Liang, F ;
Mukherjee, S ;
Panchenko, D .
LEARNING THEORY, PROCEEDINGS, 2005, 3559 :501-515
[3]   Genome-scale reconstruction of the metabolic network in Staphylococcus aureus N315: an initial draft to the two-dimensional annotation [J].
Becker, SA ;
Palsson, BO .
BMC MICROBIOLOGY, 2005, 5 (1)
[4]   COMPARTMENTS: unification and visualization of protein subcellular localization evidence [J].
Binder, Janos X. ;
Pletscher-Frankild, Sune ;
Tsafou, Kalliopi ;
Stolte, Christian ;
O'Donoghue, Sean I. ;
Schneider, Reinhard ;
Jensen, Lars Juhl .
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2014,
[5]  
BONACICH P, 1987, AM J SOCIOL, V92, P1170, DOI 10.1086/228631
[6]  
Cho KYHY, 2014, Arxiv, DOI [arXiv:1409.1259, DOI 10.3115/V1/W14-4012]
[7]   Genome-wide screening for gene function using RNAi in mammalian cells [J].
Cullen, LM ;
Arndt, GM .
IMMUNOLOGY AND CELL BIOLOGY, 2005, 83 (03) :217-223
[8]   Subgraph centrality in complex networks -: art. no. 056103 [J].
Estrada, E ;
Rodríguez-Velázquez, JA .
PHYSICAL REVIEW E, 2005, 71 (05)
[9]   Differences in the evolutionary history of disease genes affected by dominant or recessive mutations [J].
Furney, Simon J. ;
Alba, M. Mar ;
Lopez-Bigas, Nuria .
BMC GENOMICS, 2006, 7 (1)
[10]   Functional profiling of the Saccharomyces cerevisiae genome [J].
Giaever, G ;
Chu, AM ;
Ni, L ;
Connelly, C ;
Riles, L ;
Véronneau, S ;
Dow, S ;
Lucau-Danila, A ;
Anderson, K ;
André, B ;
Arkin, AP ;
Astromoff, A ;
El Bakkoury, M ;
Bangham, R ;
Benito, R ;
Brachat, S ;
Campanaro, S ;
Curtiss, M ;
Davis, K ;
Deutschbauer, A ;
Entian, KD ;
Flaherty, P ;
Foury, F ;
Garfinkel, DJ ;
Gerstein, M ;
Gotte, D ;
Güldener, U ;
Hegemann, JH ;
Hempel, S ;
Herman, Z ;
Jaramillo, DF ;
Kelly, DE ;
Kelly, SL ;
Kötter, P ;
LaBonte, D ;
Lamb, DC ;
Lan, N ;
Liang, H ;
Liao, H ;
Liu, L ;
Luo, CY ;
Lussier, M ;
Mao, R ;
Menard, P ;
Ooi, SL ;
Revuelta, JL ;
Roberts, CJ ;
Rose, M ;
Ross-Macdonald, P ;
Scherens, B .
NATURE, 2002, 418 (6896) :387-391