A Two-Step Feature Selection Method to Predict Cancerlectins by Multiview Features and Synthetic Minority Oversampling Technique

被引:25
作者
Yang, Runtao [1 ]
Zhang, Chengjin [1 ,2 ]
Zhang, Lina [1 ]
Gao, Rui [2 ]
机构
[1] Shandong Univ Weihai, Sch Mech Elect & Informat Engn, Weihai 264209, Peoples R China
[2] Shandong Univ, Sch Control Sci & Engn, Jinan 250061, Shandong, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
PSI-BLAST; SEQUENCE; LECTINS; SITES; PROTEINS; DATABASE; ATTRIBUTES; APOPTOSIS; DNA;
D O I
10.1155/2018/9364182
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Cancerlectins have an inhibitory effect on the growth of cancer cells and are currently being employed as therapeutic agents. The accurate identification of the cancerlectins should provide insight into the molecular mechanisms of cancers. In this study, a new computational method based on the RF (Random Forest) algorithm is proposed for further improving the performance of identifying cancerlectins. Hybrid feature space before feature selection is developed by combining different individual feature spaces, CTD (Composition, Transition, and Distribution), PseAAC (Pseudo Amino Acid Composition), PSSM (Position-Specific Scoring Matrix), and disorder. The SMOTE (Synthetic Minority Oversampling Technique) is applied to solve the imbalanced data problem. To reduce feature redundancy and computation complexity, we propose a two-step feature selection process to select informative features. A 5-fold cross-validation technique is used for the evaluation of various prediction strategies. The proposed method achieves a sensitivity of 0.779, a specificity of 0.717, an accuracy of 0.748, and anMCC (Matthew's Correlation Coefficient) of 0.497. The prediction results are also compared with other existing methods on the same dataset using 5-fold cross-validation. The comparison results demonstrate the high effectiveness of our method for predicting cancerlectins.
引用
收藏
页数:10
相关论文
共 59 条
  • [21] PREDICTION OF PROTEIN-FOLDING CLASS USING GLOBAL DESCRIPTION OF AMINO-ACID-SEQUENCE
    DUBCHAK, I
    MUCHNIK, I
    HOLBROOK, SR
    KIM, SH
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1995, 92 (19) : 8700 - 8704
  • [22] Intrinsically unstructured proteins and their functions
    Dyson, HJ
    Wright, PE
    [J]. NATURE REVIEWS MOLECULAR CELL BIOLOGY, 2005, 6 (03) : 197 - 208
  • [23] An introduction to ROC analysis
    Fawcett, Tom
    [J]. PATTERN RECOGNITION LETTERS, 2006, 27 (08) : 861 - 874
  • [24] iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition
    Feng, Peng-Mian
    Chen, Wei
    Lin, Hao
    Chou, Kuo-Chen
    [J]. ANALYTICAL BIOCHEMISTRY, 2013, 442 (01) : 118 - 125
  • [25] Galectin-1 is a powerful marker to distinguish chondroblastic osteosarcoma and conventional chondrosarcoma
    Gomez-Brouchet, Anne
    Mourcin, Frederic
    Gourraud, Pierre-Antoine
    Bouvier, Corinne
    De Pinieux, Gonzague
    Le Guelec, Sophie
    Brousset, Pierre
    Delisle, Marie-Bernadette
    Schiff, Claudine
    [J]. HUMAN PATHOLOGY, 2010, 41 (09) : 1220 - 1230
  • [26] An Ensemble Method for Predicting Subnuclear Localizations from Primary Protein Structures
    Han, Guo Sheng
    Yu, Zu Guo
    Vo Anh
    Krishnajith, Anaththa P. D.
    Tian, Yu-Chu
    [J]. PLOS ONE, 2013, 8 (02):
  • [27] Predicting intrinsic disorder in proteins: an overview
    He, Bo
    Wang, Kejun
    Liu, Yunlong
    Xue, Bin
    Uversky, Vladimir N.
    Dunker, A. Keith
    [J]. CELL RESEARCH, 2009, 19 (08) : 929 - 949
  • [28] Lectin Engineering, a Molecular Evolutionary Approach to Expanding the Lectin Utilities
    Hu, Dan
    Tateno, Hiroaki
    Hirabayashi, Jun
    [J]. MOLECULES, 2015, 20 (05): : 7637 - 7656
  • [29] BS-KNN: An Effective Algorithm for Predicting Protein Subchloroplast Localization
    Hu, Jing
    Yan, Xianghe
    [J]. EVOLUTIONARY BIOINFORMATICS, 2012, 8 : 79 - 87
  • [30] Lectin microarray
    Hu, Shen
    Wong, David T.
    [J]. PROTEOMICS CLINICAL APPLICATIONS, 2009, 3 (02) : 148 - 154