LipoSVM: Prediction of Lysine Lipoylation in Proteins based on the Support Vector Machine

被引:4
|
作者
Wu, Meiqi [1 ]
Lu, Pengchao [2 ]
Yang, Yingxi [3 ]
Liu, Liwen [1 ]
Wang, Hui [4 ]
Xu, Yan [1 ]
Chu, Jixun [1 ]
机构
[1] Univ Sci & Technol Beijing, Dept Appl Math, Beijing 100083, Peoples R China
[2] China Petr Pipeline Engn Co Ltd, Equipment Leasing Co, Langfang City 065000, Hebei, Peoples R China
[3] Hong Kong Univ Sci & Technol, Dept Chem & Biol Engn, Hong Kong, Peoples R China
[4] Chinese Acad Sci, Inst Comp Technol, Beijing 100080, Peoples R China
关键词
Lysine lipoylation; prediction; amino acids; support vector machine; post-translational modifications; scoring matrix; PYRUVATE-DEHYDROGENASE COMPLEX; LIPOIC ACID; ACETYLATION; CANCER;
D O I
10.2174/1389202919666191014092843
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Background: Lysine lipoylation which is a rare and highly conserved post-translational modification of proteins has been considered as one of the most important processes in the biological field. To obtain a comprehensive understanding of regulatory mechanism of lysine lipoylation, the key is to identify lysine lipoylated sites. The experimental methods are expensive and laborious. Due to the high cost and complexity of experimental methods, it is urgent to develop computational ways to predict lipoylation sites. Methodology: In this work, a predictor named LipoSVM is developed to accurately predict lipoylation sites. To overcome the problem of an unbalanced sample, synthetic minority over-sampling technique (SMOTE) is utilized to balance negative and positive samples. Furthermore, different ratios of positive and negative samples are chosen as training sets. Results: By comparing five different encoding schemes and five classification algorithms, LipoSVM is constructed finally by using a training set with positive and negative sample ratio of 1:1, combining with position-specific scoring matrix and support vector machine. The best performance achieves an accuracy of 99.98% and AUC 0.9996 in 10-fold cross-validation. The AUC of independent test set reaches 0.9997, which demonstrates the robustness of LipoSVM. The analysis between lysine lipoylation and non-lipoylation fragments shows significant statistical differences. Conclusion: A good predictor for lysine lipoylation is built based on position-specific scoring matrix and support vector machine. Meanwhile, an online webserver LipoSVM can be freely downloaded from https://github.com/stars20180811/LipoSVM.
引用
收藏
页码:362 / 370
页数:9
相关论文
共 50 条
  • [21] Prediction of Data Classification Based on Support Vector Machine
    Wu, Xinghui
    Zhou, Yuping
    PROCEEDINGS OF THE 2016 4TH INTERNATIONAL CONFERENCE ON ELECTRICAL & ELECTRONICS ENGINEERING AND COMPUTER SCIENCE (ICEEECS 2016), 2016, 50 : 694 - 699
  • [22] Granular support vector machine based method for prediction of solubility of proteins on overexpression in Escherichia coli
    Kumar, Pankaj
    Jayaraman, V. K.
    Kulkarni, B. D.
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2007, 4815 : 406 - +
  • [23] Prediction of mitochondrial proteins based on genetic algorithm - partial least squares and support vector machine
    Tan, F.
    Feng, X.
    Fang, Z.
    Li, M.
    Guo, Y.
    Jiang, L.
    AMINO ACIDS, 2007, 33 (04) : 669 - 675
  • [24] Prediction of mitochondrial proteins based on genetic algorithm – partial least squares and support vector machine
    F. Tan
    X. Feng
    Z. Fang
    M. Li
    Y. Guo
    L. Jiang
    Amino Acids, 2007, 33 : 669 - 675
  • [25] Prediction the Substrate Specificities of Membrane Transport Proteins Based on Support Vector Machine and Hybrid Features
    Li, Liqi
    Li, Jinhui
    Xiao, Weidong
    Li, Yongsheng
    Qin, Yufang
    Zhou, Shiwen
    Yang, Hua
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2016, 13 (05) : 947 - 953
  • [26] A Machine Learning Model for Wave Prediction Based on Support Vector Machine
    Liu, Qiang
    Feng, Xingya
    Tang, Tianning
    INTERNATIONAL JOURNAL OF OFFSHORE AND POLAR ENGINEERING, 2022, 32 (04) : 394 - 401
  • [27] Prediction of mitochondrial proteins using support vector machine and hidden Markov model
    Kumar, M
    Verma, R
    Raghava, GPS
    JOURNAL OF BIOLOGICAL CHEMISTRY, 2006, 281 (09) : 5357 - 5363
  • [28] Prediction of membrane proteins in Mycobacterium tuberculosis using a Support Vector Machine algorithm
    Yeh, JI
    Mao, LS
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2006, 13 (01) : 126 - 129
  • [29] Prediction of transmembrane proteins from their primary sequence by support vector machine approach
    Cai, C. Z.
    Yuan, Q. F.
    Xiao, H. G.
    Liu, X. H.
    Han, L. Y.
    Chen, Y. Z.
    COMPUTATIONAL INTELLIGENCE AND BIOINFORMATICS, PT 3, PROCEEDINGS, 2006, 4115 : 525 - 533
  • [30] Prediction of lysine HMGylation sites using multiple feature extraction and fuzzy support vector machine
    Ju, Zhe
    Wang, Shi-Yun
    ANALYTICAL BIOCHEMISTRY, 2023, 663