Feature Selection and Classification of Protein Subfamilies Using Rough Sets

被引:3
作者
Rahman, Shuzlina Abdul [1 ]
Abu Bakar, Azuraliza [1 ]
Hussein, Zeti Azura Mohamed [2 ]
机构
[1] Univ Kebangsaan Malaysia, Fac Informat Syst & Technol, Dept Management Syst & Sci, Bangi 43600, Selangor, Malaysia
[2] Univ Kebangsaan Malaysia, Fac Sci & Technol, Sch Biosci & Biotechnol, Bangi 43600, Selangor, Malaysia
来源
2009 INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS, VOLS 1 AND 2 | 2009年
关键词
Feature Selection; Protein Function Classification; Rough Sets; PREDICTION; SEQUENCE;
D O I
10.1109/ICEEI.2009.5254822
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine learning methods are known to be inefficient when faced with many features that are unnecessary for rule discovery. In coping with this issue, many methods have been proposed for selecting important features. Among them. is feature selection that selects a subset of discriminative features or attribute for model building due to its ability to avoid overfitting issue, improve model performance, provide faster and producing reliable model. This paper proposes a new method based on Rough Set algorithms, which is a rule-based data mining method to select the important features in bioinformatics datasets. Amino acid compositions are used as conditional features for the classification task. However, our results indicate that all amino acid composition features are equally important thus selecting the features are unnecessary. We do confirm the need of having a balance classes in classifying the protein function by demonstrating an increase of more than 15% in accuracy.
引用
收藏
页码:32 / 35
页数:4
相关论文
共 25 条
  • [11] Devos D, 2000, PROTEINS, V41, P98, DOI 10.1002/1097-0134(20001001)41:1<98::AID-PROT120>3.0.CO
  • [12] 2-S
  • [13] THE MULTIPLICITY OF DOMAINS IN PROTEINS
    DOOLITTLE, RF
    [J]. ANNUAL REVIEW OF BIOCHEMISTRY, 1995, 64 : 287 - 314
  • [14] Gerlt JA, 2000, GENOME BIOL, V1
  • [15] Predicting functional family of novel enzymes irrespective of sequence similarity: a statistical learning approach
    Han, LY
    Cai, CZ
    Ji, ZL
    Cao, ZW
    Cui, J
    Chen, YZ
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 (21) : 6437 - 6444
  • [16] Prediction of human protein function from post-translational modifications and localization features
    Jensen, LJ
    Gupta, R
    Blom, N
    Devos, D
    Tamames, J
    Kesmir, C
    Nielsen, H
    Stærfeldt, HH
    Rapacki, K
    Workman, C
    Andersen, CAF
    Knudsen, S
    Krogh, A
    Valencia, A
    Brunak, S
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2002, 319 (05) : 1257 - 1265
  • [17] Fuzzy-rough data reduction with ant colony optimization
    Jensen, R
    Shen, Q
    [J]. FUZZY SETS AND SYSTEMS, 2005, 149 (01) : 5 - 20
  • [18] The utility of different representations of protein sequence for predicting functional class
    King, RD
    Karwath, A
    Clare, A
    Dehaspe, L
    [J]. BIOINFORMATICS, 2001, 17 (05) : 445 - 454
  • [19] King RD, 2000, YEAST, V17, P283, DOI 10.1002/1097-0061(200012)17:4<283::AID-YEA52>3.0.CO
  • [20] 2-F