pLoc_bal-mGneg: Predict subcellular localization of Gram-negative bacterial proteins by quasi-balancing training dataset and general PseAAC

被引:56
作者
Cheng, Xiang [1 ,3 ]
Xiao, Xuan [1 ,3 ]
Chou, Kuo-Chen [2 ,3 ]
机构
[1] Jingdezhen Ceram Inst, Comp Dept, Jingdezhen, Peoples R China
[2] Univ Elect Sci & Technol China, Ctr Informat Biol, Chengdu 610054, Sichuan, Peoples R China
[3] Gordon Life Sci Inst, 53 S Cottage Rd, Boston, MA 02478 USA
基金
中国国家自然科学基金;
关键词
Multi-label system; Gram-negative bacterial proteins; IHTS; Five-step rules; ML-GKR; Chou's intuitive metrics; AMINO-ACID-COMPOSITION; SEQUENCE-BASED PREDICTOR; IDENTIFY RECOMBINATION SPOTS; INCORPORATING EVOLUTIONARY INFORMATION; PSEUDO NUCLEOTIDE COMPOSITION; LYSINE SUCCINYLATION SITES; AVERAGE CHEMICAL-SHIFT; ALIGNMENT-FREE METHOD; 3 DIFFERENT MODES; MEMBRANE-PROTEINS;
D O I
10.1016/j.jtbi.2018.09.005
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
One of the hottest topics in molecular cell biology is to determine the subcellular localization of proteins from various different organisms. This is because it is crucially important for both basic research and drug development. Recently, a predictor called "pLoc-mGneg" was developed for identifying the subcellular localization of Gram-negative bacterial proteins. Its performance is overwhelmingly better than that of the other predictors for the same purpose, particularly in dealing with multi-label systems in which some proteins, called "multiplex proteins", may simultaneously occur in two or more subcellular locations. Although it is indeed a very powerful predictor, more efforts are definitely needed to further improve it. This is because pLoc-mGneg was trained by an extremely skewed dataset in which some subset (sub cellular location) was about 5 to 70 times the size of the other subsets. Accordingly, it cannot avoid the biased consequence caused by such an uneven training dataset. To alleviate such a consequence, we have developed a new and bias-reducing predictor called pLoc_bal-mGneg by quasi-balancing the training dataset. Cross-validation tests on exactly the same experiment-confirmed dataset have indicated that the proposed new predictor is remarkably superior to pLoc-mGneg, the existing state-of-the-art predictor in identifying the subcellular localization of Gram-negative bacterial proteins. To maximize the convenience for most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cnipLoc_bal-mGnegi, by which users can easily get their desired results without the need to go through the detailed mathematics. (C) 2018 Elsevier Ltd. All rights reserved.
引用
收藏
页码:92 / 102
页数:11
相关论文
共 166 条
[1]   Identification of Heat Shock Protein families and J-protein types by incorporating Dipeptide Composition into Chou's general PseAAC [J].
Ahmad, Saeed ;
Kabir, Muhammad ;
Hayat, Maqsood .
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2015, 122 (02) :165-174
[2]   Classification of membrane protein types using Voting Feature Interval in combination with Chou's Pseudo Amino Acid Composition [J].
Ali, Farman ;
Hayat, Maqsood .
JOURNAL OF THEORETICAL BIOLOGY, 2015, 384 :78-83
[3]  
[Anonymous], BIOINFORMATICS
[4]  
[Anonymous], BIOINFORMATICS
[5]   iMem-2LSAAC: A two-level model for discrimination of membrane proteins and their types by extending the notion of SAAC into chou's pseudo amino acid composition [J].
Arif, Muhammad ;
Hayat, Maqsood ;
Jan, Zahoor .
JOURNAL OF THEORETICAL BIOLOGY, 2018, 442 :11-21
[6]   Analysis and comparison of lignin peroxidases between fungi and bacteria using three different modes of Chou's general pseudo amino acid composition [J].
Behbahani, Mandana ;
Mohabatkar, Hassan ;
Nosrati, Mokhtar .
JOURNAL OF THEORETICAL BIOLOGY, 2016, 411 :1-5
[7]   Implications of Newly Identified Brain eQTL Genes and Their Interactors in Schizophrenia [J].
Cai, Lei ;
Huang, Tao ;
Su, Jingjing ;
Zhang, Xinxin ;
Chen, Wenzhong ;
Zhang, Fuquan ;
He, Lin ;
Chou, Kuo-Chen .
MOLECULAR THERAPY-NUCLEIC ACIDS, 2018, 12 :433-442
[8]   Using LogitBoost classifier to predict protein structural classes [J].
Cai, YD ;
Feng, KY ;
Lu, WC ;
Chou, KC .
JOURNAL OF THEORETICAL BIOLOGY, 2006, 238 (01) :172-176
[9]   Predicting subcellular localization of proteins in a hybridization space [J].
Cai, YD ;
Chou, KC .
BIOINFORMATICS, 2004, 20 (07) :1151-1156
[10]   Support vector machines for prediction of protein subcellular location by incorporating quasi-sequence-order effect [J].
Cai, YD ;
Liu, XJ ;
Xu, XB ;
Chou, KC .
JOURNAL OF CELLULAR BIOCHEMISTRY, 2002, 84 (02) :343-348