Identifying Protein Subcellular Location with Embedding Features Learned from Networks

被引:43
|
作者
Liu, Hongwei [1 ]
Hu, Bin [2 ]
Chen, Lei [1 ]
Lu, Lin [3 ]
机构
[1] Shanghai Maritime Univ, Coll Informat Engn, Shanghai 201306, Peoples R China
[2] Guangdong Acad Agr Sci, Guangdong Publ Lab Anim Breeding & Nutr, State Key Lab Livestock & Poultry Breeding, Inst Anim Sci,Guangdong Prov Key Lab Anim Breedin, Guangzhou 510640, Peoples R China
[3] Columbia Univ, Dept Radiol, Med Ctr, New York, NY USA
关键词
Protein subcellular location prediction; network embedding algorithm; deepWalk; Node2vec; mashup; machine learning algorithm; support vector machine; random forest; AMINO-ACID-COMPOSITION; FUNCTIONAL DOMAIN COMPOSITION; PREDICTION; LOCALIZATION; ALGORITHM;
D O I
10.2174/1570164617999201124142950
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Identification of protein subcellular location is an important problem be-cause the subcellular location is highly related to protein function. It is fundamental to determine the locations with biology experiments. However, these experiments are of high costs and time-con-suming. The alternative way to address such a problem is to design effective computational meth-ods. Objective: To date, several computational methods have been proposed in this regard. However, th-ese methods mainly adopted the features derived from the proteins themselves. On the other hand, with the development of the network technique, several embedding algorithms have been pro-posed, which can encode nodes in the network into feature vectors. Such algorithms connected the network and traditional classification algorithms. Thus, they provided a new way to construct mod -els for the prediction of protein subcellular location. Methods: In this study, we analyzed features produced by three network embedding algorithms (DeepWalk, Node2vec and Mashup) that were applied on one or multiple protein networks. Ob-tained features were learned by one machine learning algorithm (support vector machine or ran-dom forest) to construct the model. The cross-validation method was adopted to evaluate all con-structed models. Results: After evaluating models with the cross-validation method, embedding features yielded by Mashup on multiple networks were quite informative for predicting protein subcellular location. The model based on these features were superior to some classic models. Conclusion: Embedding features yielded by a proper and powerful network embedding algorithm were effective for building the model for prediction of protein subcellular location, providing new pipelines to build more efficient models.
引用
收藏
页码:646 / 660
页数:15
相关论文
共 50 条
  • [1] Identification of Human Protein Subcellular Location with Multiple Networks
    Wang, Rui
    Chen, Lei
    CURRENT PROTEOMICS, 2022, 19 (04) : 344 - 356
  • [2] Predicting protein subcellular location with network embedding and enrichment features
    Pan, Xiaoyong
    Lu, Lin
    Cai, Yu-Dong
    BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS, 2020, 1868 (10):
  • [3] Predicting Protein Subcellular Location Based on a Novel Sequence Numerical Model
    Chen, Haowen
    Chen, Xia
    Hu, Qingming
    Cao, Zhi
    JOURNAL OF COMPUTATIONAL AND THEORETICAL NANOSCIENCE, 2015, 12 (01) : 82 - 87
  • [4] A Novel Ensemble Technique for Protein Subcellular Location Prediction
    Rozza, Alessandro
    Lombardi, Gabriele
    Re, Matteo
    Casiraghi, Elena
    Valentini, Giorgio
    Campadelli, Paola
    ENSEMBLES IN MACHINE LEARNING APPLICATIONS, 2011, 373 : 151 - 167
  • [5] Bioimage-Based Prediction of Protein Subcellular Location in Human Tissue with Ensemble Features and Deep Networks
    Liu, Guang-Hui
    Zhang, Bei-Wei
    Qian, Gang
    Wang, Bin
    Mao, Bo
    Bichindaritz, Isabelle
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2020, 17 (06) : 1966 - 1980
  • [6] A complexity-based method for predicting protein subcellular location
    Zheng, Xiaoqi
    Liu, Taigang
    Wang, Jun
    AMINO ACIDS, 2009, 37 (02) : 427 - 433
  • [7] Consistency and variation of protein subcellular location annotations
    Xu, Ying-Ying
    Zhou, Hang
    Murphy, Robert F.
    Shen, Hong-Bin
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2021, 89 (02) : 242 - 250
  • [8] Multitask Learning for Protein Subcellular Location Prediction
    Xu, Qian
    Pan, Sinno Jialin
    Xue, Hannah Hong
    Yang, Qiang
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2011, 8 (03) : 748 - 759
  • [9] Predicting Apoptosis Protein Subcellular Location with PseAAC by Incorporating Tripeptide Composition
    Liao, Bo
    Jiang, Jun-Bao
    Zeng, Qing-Guang
    Zhu, Wen
    PROTEIN AND PEPTIDE LETTERS, 2011, 18 (11): : 1086 - 1092
  • [10] Protein subcellular location prediction
    Chou, KC
    Elrod, DW
    PROTEIN ENGINEERING, 1999, 12 (02): : 107 - 118