iSuc-PseOpt: Identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset

被引:255
作者
Jia, Jianhua [1 ,2 ]
Liu, Zi [1 ]
Xiao, Xuan [1 ,2 ]
Liu, Bingxiang [1 ]
Chou, Kuo-Chen [2 ,3 ]
机构
[1] Jing De Zhen Ceram Inst, Dept Comp, Jing De Zhen 333403, Peoples R China
[2] Gordon Life Sci Inst, Boston, MA 02478 USA
[3] King Abdulaziz Univ, Ctr Excellence Genom Med Res, Jeddah 21589, Saudi Arabia
关键词
Lysine succinylation; Sequence-coupling model; PseAAC; Optimize training dataset; Target cross-validation; AMINO-ACID-COMPOSITION; S-NITROSYLATION SITES; LABEL LEARNING CLASSIFIER; PROTEASE CLEAVAGE SITES; SUBCELLULAR-LOCALIZATION; PHYSICOCHEMICAL PROPERTIES; GENERAL-FORM; WEB-SERVER; K-TUPLE; POSTTRANSLATIONAL MODIFICATIONS;
D O I
10.1016/j.ab.2015.12.009
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Succinylation is a posttranslational modification (PTM) where a succinyl group is added to a Lys (K) residue of a protein molecule. Lysine succinylation plays an important role in orchestrating various biological processes, but it is also associated with some diseases. Therefore, we are challenged by the following problem from both basic research and drug development: given an uncharacterized protein sequence containing many Lys residues, which one of them can be succinylated, and which one cannot? With the avalanche of protein sequences generated in the postgenomic age, the answer to the problem has become even more urgent. Fortunately, the statistical significance experimental data for succinylated sites in proteins have become available very recently, an indispensable prerequisite for developing a computational method to address this problem. By incorporating the sequence-coupling effects into the general pseudo amino acid composition and using KNNC (K-nearest neighbors cleaning) treatment and IHTS (inserting hypothetical training samples) treatment to optimize the training dataset, a predictor called iSuc-PseOpt has been developed. Rigorous cross-validations indicated that it remarkably outperformed the existing method. A user-friendly web-server for iSuc-PseOpt has been established at http://www.jci-bioinfo.cnfiSuc-PseOpt, where users can easily get their desired results without needing to go through the complicated mathematical equations involved. (C) 2015 Elsevier Inc. All rights reserved.
引用
收藏
页码:48 / 56
页数:9
相关论文
共 100 条
[1]   Identification of Heat Shock Protein families and J-protein types by incorporating Dipeptide Composition into Chou's general PseAAC [J].
Ahmad, Saeed ;
Kabir, Muhammad ;
Hayat, Maqsood .
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2015, 122 (02) :165-174
[2]   KINETIC-STUDIES WITH THE NONNUCLEOSIDE HIV-1 REVERSE-TRANSCRIPTASE INHIBITOR-U-88204E [J].
ALTHAUS, IW ;
CHOU, JJ ;
GONZALES, AJ ;
DEIBEL, MR ;
CHOU, KC ;
KEZDY, FJ ;
ROMERO, DL ;
PALMER, JR ;
THOMAS, RC ;
ARISTOFF, PA ;
TARPLEY, WG ;
REUSSER, F .
BIOCHEMISTRY, 1993, 32 (26) :6548-6554
[3]  
[Anonymous], J BIOMOL STRUCT DYN
[4]  
[Anonymous], 2006, 23 INT C MACH LEARN, DOI [10.1145/1143844.1143874, DOI 10.1145/1143844.1143874]
[5]  
[Anonymous], BIOINFORMATICS
[6]   The Universal Protein Resource (UniProt) in 2010 [J].
Apweiler, Rolf ;
Martin, Maria Jesus ;
O'Donovan, Claire ;
Magrane, Michele ;
Alam-Faruque, Yasmin ;
Antunes, Ricardo ;
Barrell, Daniel ;
Bely, Benoit ;
Bingley, Mark ;
Binns, David ;
Bower, Lawrence ;
Browne, Paul ;
Chan, Wei Mun ;
Dimmer, Emily ;
Eberhardt, Ruth ;
Fedotov, Alexander ;
Foulger, Rebecca ;
Garavelli, John ;
Huntley, Rachael ;
Jacobsen, Julius ;
Kleen, Michael ;
Laiho, Kati ;
Leinonen, Rasko ;
Legge, Duncan ;
Lin, Quan ;
Liu, Wudong ;
Luo, Jie ;
Orchard, Sandra ;
Patient, Samuel ;
Poggioli, Diego ;
Pruess, Manuela ;
Corbett, Matt ;
di Martino, Giuseppe ;
Donnelly, Mike ;
van Rensburg, Pieter ;
Bairoch, Amos ;
Bougueleret, Lydie ;
Xenarios, Ioannis ;
Altairac, Severine ;
Auchincloss, Andrea ;
Argoud-Puy, Ghislaine ;
Axelsen, Kristian ;
Baratin, Delphine ;
Blatter, Marie-Claude ;
Boeckmann, Brigitte ;
Bolleman, Jerven ;
Bollondi, Laurent ;
Boutet, Emmanuel ;
Quintaje, Silvia Braconi ;
Breuza, Lionel .
NUCLEIC ACIDS RESEARCH, 2010, 38 :D142-D148
[7]  
Breiman L., 2001, Machine Learning, V45, P5
[8]   propy: a tool to generate various modes of Chou's PseAAC [J].
Cao, Dong-Sheng ;
Xu, Qing-Song ;
Liang, Yi-Zeng .
BIOINFORMATICS, 2013, 29 (07) :960-962
[9]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[10]   Prediction of linear B-cell epitopes using amino acid pair antigenicity scale [J].
Chen, J. ;
Liu, H. ;
Yang, J. ;
Chou, K.-C. .
AMINO ACIDS, 2007, 33 (03) :423-428