CARSVM: A class association rule-based classification framework and its application to gene expression data

被引:12
作者
Kianmehr, Keivan [1 ]
Alhajj, Reda [1 ]
机构
[1] Univ Calgary, Dept Comp Sci, BIDEALS Grp, Calgary, AB T2N 1N4, Canada
关键词
Machine learning; Association rule mining; Associative classifiers; Support vector machine; Data mining; Gene expression analysis; Gene expression classification; Gene selection;
D O I
10.1016/j.artmed.2008.05.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Objective: In this study, we aim at building a classification framework, namely the CARSVM model, which integrates association rule mining and support vector machine (SVM). The goal is to benefit from advantages of both, the discriminative knowledge represented by class association rules and the classification power of the SVM algorithm, to construct an efficient and accurate classifier model that improves the interpretability problem of SVM as a traditional machine learning technique and overcomes the efficiency issues of associative classification algorithms. Method: In our proposed framework: instead of using the original training set, a set of rule-based feature vectors, which are generated based on the discriminative ability of class association rules over the training samples, are presented to the learning component of the SVM algorithm. We show that rule-based feature vectors present a high-qualified source of discrimination knowledge that can impact substantially the prediction power of SVM and associative classification techniques. They provide users with more conveniences in terms of understandability and interpretability as well. Results: We have used four datasets from LICI ML repository to evaluate the performance of the developed system in comparison with five well-known existing classification methods. Because of the importance and popularity of gene expression analysis as real world application of the classification model, we present an extension of CARSVM combined with feature selection to be applied to gene expression data. Then, we describe how this combination wilt provide biologists with an efficient and understandable classifier model. The reported test results and their biological interpretation demonstrate the applicability, efficiency and effectiveness of the proposed model. Conclusion: From the results, it can be concluded that a considerable increase in classification accuracy can be obtained when the rule-based feature vectors are integrated in the learning process of the SVM algorithm. In the context of applicability according to the results obtained from gene expression analysis, we can conclude that the CARSVM system can be utilized in a variety of real world applications with some adjustments. (C) 2008 Elsevier B.V. All rights reserved.
引用
收藏
页码:7 / 25
页数:19
相关论文
共 41 条
[1]  
Agrawal R., 1994, Proceedings of the 20th International Conference on Very Large Data Bases. VLDB'94, P487
[2]  
[Anonymous], 1993, P 13 INT JOINT C ART
[3]  
Banerjee AG, 2003, MOL CANCER, V8, P34
[4]  
Becquet C, 2002, GENOME BIOL, V3
[5]   Prediction of biologically significant components from microarray data: Independently Consistent Expression Discriminator (ICED) [J].
Bijlani, R ;
Cheng, YH ;
Pearce, DA ;
Brooks, AI ;
Ogihara, M .
BIOINFORMATICS, 2003, 19 (01) :62-70
[6]   Knowledge-based analysis of microarray gene expression data by using support vector machines [J].
Brown, MPS ;
Grundy, WN ;
Lin, D ;
Cristianini, N ;
Sugnet, CW ;
Furey, TS ;
Ares, M ;
Haussler, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) :262-267
[7]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[8]  
Clark P., 1993, P 10 INT C MACH LEAR, P49
[9]  
COENEN F, LUCS KDD DN SOFTWARE
[10]   The effect of threshold values on association rule based classification accuracy [J].
Coenen, Frans ;
Leng, Paul .
DATA & KNOWLEDGE ENGINEERING, 2007, 60 (02) :345-360