LipoSVM: Prediction of Lysine Lipoylation in Proteins based on the Support Vector Machine

被引:4
|
作者
Wu, Meiqi [1 ]
Lu, Pengchao [2 ]
Yang, Yingxi [3 ]
Liu, Liwen [1 ]
Wang, Hui [4 ]
Xu, Yan [1 ]
Chu, Jixun [1 ]
机构
[1] Univ Sci & Technol Beijing, Dept Appl Math, Beijing 100083, Peoples R China
[2] China Petr Pipeline Engn Co Ltd, Equipment Leasing Co, Langfang City 065000, Hebei, Peoples R China
[3] Hong Kong Univ Sci & Technol, Dept Chem & Biol Engn, Hong Kong, Peoples R China
[4] Chinese Acad Sci, Inst Comp Technol, Beijing 100080, Peoples R China
关键词
Lysine lipoylation; prediction; amino acids; support vector machine; post-translational modifications; scoring matrix; PYRUVATE-DEHYDROGENASE COMPLEX; LIPOIC ACID; ACETYLATION; CANCER;
D O I
10.2174/1389202919666191014092843
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Background: Lysine lipoylation which is a rare and highly conserved post-translational modification of proteins has been considered as one of the most important processes in the biological field. To obtain a comprehensive understanding of regulatory mechanism of lysine lipoylation, the key is to identify lysine lipoylated sites. The experimental methods are expensive and laborious. Due to the high cost and complexity of experimental methods, it is urgent to develop computational ways to predict lipoylation sites. Methodology: In this work, a predictor named LipoSVM is developed to accurately predict lipoylation sites. To overcome the problem of an unbalanced sample, synthetic minority over-sampling technique (SMOTE) is utilized to balance negative and positive samples. Furthermore, different ratios of positive and negative samples are chosen as training sets. Results: By comparing five different encoding schemes and five classification algorithms, LipoSVM is constructed finally by using a training set with positive and negative sample ratio of 1:1, combining with position-specific scoring matrix and support vector machine. The best performance achieves an accuracy of 99.98% and AUC 0.9996 in 10-fold cross-validation. The AUC of independent test set reaches 0.9997, which demonstrates the robustness of LipoSVM. The analysis between lysine lipoylation and non-lipoylation fragments shows significant statistical differences. Conclusion: A good predictor for lysine lipoylation is built based on position-specific scoring matrix and support vector machine. Meanwhile, an online webserver LipoSVM can be freely downloaded from https://github.com/stars20180811/LipoSVM.
引用
收藏
页码:362 / 370
页数:9
相关论文
共 50 条
  • [1] Support vector machine prediction of unstructured proteins
    Weathers, EA
    Hoh, JH
    Paulaitis, ME
    Woolf, TB
    BIOPHYSICAL JOURNAL, 2004, 86 (01) : 307A - 307A
  • [2] Prediction of disulfide connectivity in proteins with support vector machine
    Hsuan-Liang Liu
    Shih-Chieh Chen
    JOURNAL OF THE CHINESE INSTITUTE OF CHEMICAL ENGINEERS, 2007, 38 (01): : 63 - 70
  • [3] Prediction of the β-hairpins in proteins using support vector machine
    Hu, Xiu Zhen
    Li, Qian Zhong
    PROTEIN JOURNAL, 2008, 27 (02): : 115 - 122
  • [4] Prediction of the β-Hairpins in Proteins Using Support Vector Machine
    Xiu Zhen Hu
    Qian Zhong Li
    The Protein Journal, 2008, 27 : 115 - 122
  • [5] Support vector machine based prediction of glutathione S-transferase proteins
    Mishra, Nitish Kumar
    Kumar, Manish
    Raghava, G. P. S.
    PROTEIN AND PEPTIDE LETTERS, 2007, 14 (06): : 575 - 580
  • [6] Predicting lysine lipoylation sites using bi-profile bayes feature extraction and fuzzy support vector machine algorithm
    Ju, Zhe
    Wang, Shi-Yun
    ANALYTICAL BIOCHEMISTRY, 2018, 561 : 11 - 17
  • [7] Better prediction of the location of α-turns in proteins with support vector machine
    Wang, Yan
    Xue, Zhidong
    Xu, Jin
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2006, 65 (01) : 49 - 54
  • [8] Lysine acetylation sites prediction using an ensemble of support vector machine classifiers
    Xu, Yan
    Wang, Xiao-Bo
    Ding, Jun
    Wu, Ling-Yun
    Deng, Nai-Yang
    JOURNAL OF THEORETICAL BIOLOGY, 2010, 264 (01) : 130 - 135
  • [9] BLProt: prediction of bioluminescent proteins based on support vector machine and relieff feature selection
    Kandaswamy, Krishna Kumar
    Pugalenthi, Ganesan
    Hazrati, Mehrnaz Khodam
    Kalies, Kai-Uwe
    Martinetz, Thomas
    BMC BIOINFORMATICS, 2011, 12
  • [10] BLProt: prediction of bioluminescent proteins based on support vector machine and relieff feature selection
    Krishna Kumar Kandaswamy
    Ganesan Pugalenthi
    Mehrnaz Khodam Hazrati
    Kai-Uwe Kalies
    Thomas Martinetz
    BMC Bioinformatics, 12