Conformational B-Cell Epitopes Prediction from Sequences Using Cost-Sensitive Ensemble Classifiers and Spatial Clustering

被引:44
作者
Zhang, Jian [1 ]
Zhao, Xiaowei [1 ]
Sun, Pingping [1 ,2 ]
Gao, Bo [1 ]
Ma, Zhiqiang [1 ]
机构
[1] NE Normal Univ, Sch Comp Sci & Informat Technol, Changchun 1300117, Peoples R China
[2] NE Normal Univ, Engn Lab Drug Gene & Prot Screening, Changchun 1300117, Peoples R China
基金
中国博士后科学基金; 高等学校博士学科点专项科研基金;
关键词
EVOLUTIONARY INFORMATION; UNSTRUCTURED PROTEINS; ANTIGENIC EPITOPES; RESIDUES; SERVER; CLASSIFICATION; ANTIBODY; SITES; TOOL;
D O I
10.1155/2014/689219
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
B-cell epitopes are regions of the antigen surface which can be recognized by certain antibodies and elicit the immune response. Identification of epitopes for a given antigen chain finds vital applications in vaccine and drug research. Experimental prediction of B-cell epitopes is time-consuming and resource intensive, which may benefit from the computational approaches to identify B-cell epitopes. In this paper, a novel cost-sensitive ensemble algorithm is proposed for predicting the antigenic determinant residues and then a spatial clustering algorithm is adopted to identify the potential epitopes. Firstly, we explore various discriminative features from primary sequences. Secondly, cost-sensitive ensemble scheme is introduced to deal with imbalanced learning problem. Thirdly, we adopt spatial algorithm to tell which residues may potentially form the epitopes. Based on the strategies mentioned above, a new predictor, called CBEP (conformational B-cell epitopes prediction), is proposed in this study. CBEP achieves good prediction performance with the mean AUC scores (AUCs) of 0.721 and 0.703 on two benchmark datasets (bound and unbound) using the leave-one-out cross-validation (LOOCV). When compared with previous prediction tools, CBEP produces higher sensitivity and comparable specificity values. A web server named CBEP which implements the proposed method is available for academic use.
引用
收藏
页数:12
相关论文
共 55 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] Prediction of residues in discontinuous B-cell epitopes using protein 3D structures
    Andersen, Pernille Haste
    Nielsen, Morten
    Lund, Ole
    [J]. PROTEIN SCIENCE, 2006, 15 (11) : 2558 - 2567
  • [3] ON MULTI-CLASS COST-SENSITIVE LEARNING
    Zhou, Zhi-Hua
    Liu, Xu-Ying
    [J]. COMPUTATIONAL INTELLIGENCE, 2010, 26 (03) : 232 - 257
  • [4] Ansari Hifzur Rahman, 2010, Immunome Res, V6, P6, DOI 10.1186/1745-7580-6-6
  • [5] Chawla N. V., 2004, ACM SIGKDD Explorations Newsletter, V6, P1
  • [6] SMOTEBoost: Improving prediction of the minority class in boosting
    Chawla, NV
    Lazarevic, A
    Hall, LO
    Bowyer, KW
    [J]. KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2003, PROCEEDINGS, 2003, 2838 : 107 - 119
  • [7] Untitled
    Chen, Chang Wen
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2009, 19 (01) : 1 - 2
  • [8] The Fisher-Markov Selector: Fast Selecting Maximally Separable Feature Subset for Multiclass Classification with Applications to High-Dimensional Data
    Cheng, Qiang
    Zhou, Hongbo
    Cheng, Jie
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (06) : 1217 - 1233
  • [9] THE CONVERGENCE-DIVERGENCE DUALITY IN LECTIN DOMAINS OF SELECTIN FAMILY AND ITS IMPLICATIONS
    CHOU, KC
    [J]. FEBS LETTERS, 1995, 363 (1-2) : 123 - 126
  • [10] Pseudo Amino Acid Composition and its Applications in Bioinformatics, Proteomics and System Biology
    Chou, Kuo-Chen
    [J]. CURRENT PROTEOMICS, 2009, 6 (04) : 262 - 274