Using Supervised Complexity Measures in the Analysis of Cancer Gene Expression Data Sets

被引:0
作者
Costa, Ivan G. [1 ]
Lorena, Ana C. [2 ]
Peres, Liciana R. M. P. y [2 ]
de Souto, Marcilio C. P. [3 ]
机构
[1] Univ Fed Pernambuco, Ctr Informat, Recife, PE, Brazil
[2] ABC Fed Univ, Ctr Math Comp & Cognit, Santo Andre, SP, Brazil
[3] Univ Fed Rio Grande do Norte, Dept Informat & Appl Math, BR-59072970 Natal, RN, Brazil
来源
ADVANCES IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, PROCEEDINGS | 2009年 / 5676卷
基金
巴西圣保罗研究基金会;
关键词
Cancer gene expression classification; Machine Learning; data set complexity; CLASS DISCOVERY; CLASSIFICATION; MICROARRAY;
D O I
暂无
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Supervised Machine Learning methods have been successfully applied for performing gene expression based cancer diagnosis. Characteristics intrinsic to cancer gene expression data sets, such as high dimensionality, low number of samples and presence of noise makes the classification task very difficult. Furthermore, limitations in the classifier performance may often be attributed to characteristics intrinsic to a particular data set. This paper presents an analysis of gene expression data sets for cancer diagnosis using classification complexity measures. Such measures consider data geometry, distribution and linear separability as indications of complexity of the classification task. The results obtained indicate that the cancer data sets investigated are formed by mostly linearly separable non-overlapping classes, supporting the good predictive performance of robust linear classifiers, such as SVMs, on the given data sets. Furthermore, we found two complexity indices, which were good indicators for the difficulty of gene expression based cancer diagnosis.
引用
收藏
页码:48 / +
页数:4
相关论文
共 32 条