Identifying Protein Subcellular Location with Embedding Features Learned from Networks

被引:43
作者
Liu, Hongwei [1 ]
Hu, Bin [2 ]
Chen, Lei [1 ]
Lu, Lin [3 ]
机构
[1] Shanghai Maritime Univ, Coll Informat Engn, Shanghai 201306, Peoples R China
[2] Guangdong Acad Agr Sci, Guangdong Publ Lab Anim Breeding & Nutr, State Key Lab Livestock & Poultry Breeding, Inst Anim Sci,Guangdong Prov Key Lab Anim Breedin, Guangzhou 510640, Peoples R China
[3] Columbia Univ, Dept Radiol, Med Ctr, New York, NY USA
关键词
Protein subcellular location prediction; network embedding algorithm; deepWalk; Node2vec; mashup; machine learning algorithm; support vector machine; random forest; AMINO-ACID-COMPOSITION; FUNCTIONAL DOMAIN COMPOSITION; PREDICTION; LOCALIZATION; ALGORITHM;
D O I
10.2174/1570164617999201124142950
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Identification of protein subcellular location is an important problem be-cause the subcellular location is highly related to protein function. It is fundamental to determine the locations with biology experiments. However, these experiments are of high costs and time-con-suming. The alternative way to address such a problem is to design effective computational meth-ods. Objective: To date, several computational methods have been proposed in this regard. However, th-ese methods mainly adopted the features derived from the proteins themselves. On the other hand, with the development of the network technique, several embedding algorithms have been pro-posed, which can encode nodes in the network into feature vectors. Such algorithms connected the network and traditional classification algorithms. Thus, they provided a new way to construct mod -els for the prediction of protein subcellular location. Methods: In this study, we analyzed features produced by three network embedding algorithms (DeepWalk, Node2vec and Mashup) that were applied on one or multiple protein networks. Ob-tained features were learned by one machine learning algorithm (support vector machine or ran-dom forest) to construct the model. The cross-validation method was adopted to evaluate all con-structed models. Results: After evaluating models with the cross-validation method, embedding features yielded by Mashup on multiple networks were quite informative for predicting protein subcellular location. The model based on these features were superior to some classic models. Conclusion: Embedding features yielded by a proper and powerful network embedding algorithm were effective for building the model for prediction of protein subcellular location, providing new pipelines to build more efficient models.
引用
收藏
页码:646 / 660
页数:15
相关论文
共 50 条
  • [41] Immunostaining: Detection of Signaling Protein Location in Tissues, Cells and Subcellular Compartments
    Maity, Biswanath
    Sheff, David
    Fisher, Rory A.
    LABORATORY METHODS IN CELL BIOLOGY: IMAGING, 2012, 113 : 81 - 105
  • [42] Bioimage-based protein subcellular location prediction: a comprehensive review
    Ying-Ying Xu
    Li-Xiu Yao
    Hong-Bin Shen
    Frontiers of Computer Science, 2018, 12 : 26 - 39
  • [43] Identifying protein complexes based on node embeddings obtained from protein-protein interaction networks
    Xiaoxia Liu
    Zhihao Yang
    Shengtian Sang
    Ziwei Zhou
    Lei Wang
    Yin Zhang
    Hongfei Lin
    Jian Wang
    Bo Xu
    BMC Bioinformatics, 19
  • [44] Identifying protein complexes based on node embeddings obtained from protein-protein interaction networks
    Liu, Xiaoxia
    Yang, Zhihao
    Sang, Shengtian
    Zhou, Ziwei
    Wang, Lei
    Zhang, Yin
    Lin, Hongfei
    Wang, Jian
    Xu, Bo
    BMC BIOINFORMATICS, 2018, 19
  • [45] Virus-mPLoc: A Fusion Classifier for Viral Protein Subcellular Location Prediction by Incorporating Multiple Sites
    Shen, Hong-Bin
    Chou, Kuo-Chen
    JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 2010, 28 (02) : 175 - 186
  • [46] PLA-GNN: Computational inference of protein subcellular location alterations under drug treatments with deep graph neural networks
    Wang, Ren-Hua
    Luo, Tao
    Zhang, Han -Lin
    Du, Pu-Feng
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 157
  • [47] TESTLoc: protein subcellular localization prediction from EST data
    Yao-Qing Shen
    Gertraud Burger
    BMC Bioinformatics, 11
  • [48] Ensemble learning for protein multiplex subcellular localization prediction based on weighted KNN with different features
    Qiao, Shanping
    Yan, Baoqiang
    Li, Jing
    APPLIED INTELLIGENCE, 2018, 48 (07) : 1813 - 1824
  • [49] Prediction of apoptosis protein subcellular localization via heterogeneous features and hierarchical extreme learning machine
    Zhang, S.
    Zhang, T.
    Liu, C.
    SAR AND QSAR IN ENVIRONMENTAL RESEARCH, 2019, 30 (03) : 209 - 228
  • [50] iCarPS: a computational tool for identifying protein carbonylation sites by novel encoded features
    Zhang, Dan
    Xu, Zhao-Chun
    Su, Wei
    Yang, Yu-He
    Lv, Hao
    Yang, Hui
    Lin, Hao
    BIOINFORMATICS, 2021, 37 (02) : 171 - 177