Analysis of complexity indices for classification problems: Cancer gene expression data

被引:41
|
作者
Lorena, Ana C.
Costa, Ivan G. [1 ]
Spolaor, Newton
de Souto, Marcilio C. P. [1 ]
机构
[1] Univ Fed Pernambuco, Ctr Informat, Recife, PE, Brazil
关键词
Classification; Gene expression data; Complexity indices; Linear separability; BREAST-CANCER; MICROARRAY; SENSITIVITY; PREDICTION; ALGORITHMS; SELECTION; RANKING;
D O I
10.1016/j.neucom.2011.03.054
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Currently, cancer diagnosis at a molecular level has been made possible through the analysis of gene expression data. More specifically, one usually uses machine learning (ML) techniques to build, from cancer gene expression data, automatic diagnosis models (classifiers). Cancer gene expression data often present some characteristics that can have a negative impact in the generalization ability of the classifiers generated. Some of these properties are data sparsity and an unbalanced class distribution. We investigate the results of a set of indices able to extract the intrinsic complexity information from the data. Such measures can be used to analyze, among other things, which particular characteristics of cancer gene expression data mostly impact the prediction ability of support vector machine classifiers. In this context, we also show that, by applying a proper feature selection procedure to the data, one can reduce the influence of those characteristics in the error rates of the classifiers induced. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:33 / 42
页数:10
相关论文
共 50 条
  • [41] Classification and Characterization of Gene Expression Data with Generalized Eigenvalues
    Guarracino, M. R.
    Cuciniello, S.
    Pardalos, P. M.
    JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2009, 141 (03) : 533 - 545
  • [42] Hybridized KNN and SVM for gene expression data classification
    Mei, Zhen
    Shen, Qi
    Ye, Baoxian
    LIFE SCIENCE JOURNAL-ACTA ZHENGZHOU UNIVERSITY OVERSEAS EDITION, 2009, 6 (03): : 61 - 66
  • [43] Hybridized KNN and SVM for gene expression data classification
    Mei, Zhen
    Shen, Qi
    Ye, Baoxian
    LIFE SCIENCE JOURNAL-ACTA ZHENGZHOU UNIVERSITY OVERSEAS EDITION, 2009, 6 (01): : 61 - 66
  • [44] Complexity measures of supervised classification problems
    Ho, TK
    Basu, M
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (03) : 289 - 300
  • [45] GENE EXPRESSION DATA CLASSIFICATION AND PATTERN ANALYSIS USING DATA DRIVEN APPROACH
    Ramisa, Aiman Jabeen
    Hossain, Ananna
    Islam, S. K. Md Injamul
    Swadesh, Ponuel Mollah
    Islam, Md Toushif
    Rahman, Md Anisur
    Parvez, Mohammad Zavid
    PROCEEDINGS OF 2021 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), 2021, : 82 - 90
  • [46] Cancer Classification of Gene Expression Data using Machine Learning Models
    De Guia, Joseph M.
    Devaraj, Madhavi
    Vea, Larry A.
    2018 IEEE 10TH INTERNATIONAL CONFERENCE ON HUMANOID, NANOTECHNOLOGY, INFORMATION TECHNOLOGY, COMMUNICATION AND CONTROL, ENVIRONMENT AND MANAGEMENT (HNICEM), 2018,
  • [47] Multiclass sparse logistic regression for classification of multiple cancer types using gene expression data
    Kim, Yongdai
    Kwon, Sunghoon
    Song, Seuck Heun
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2006, 51 (03) : 1643 - 1655
  • [48] A Comprehensive Survey of Recent Hybrid Feature Selection Methods in Cancer Microarray Gene Expression Data
    Almazrua, Halah
    Alshamlan, Hala
    IEEE ACCESS, 2022, 10 : 71427 - 71449
  • [49] Effective Cancer Classification based on Gene Expression Data using Multidimensional Mutual Information and ELM
    Zhu, Qun-Xiong
    Fan, Yuan
    He, Yan-Lin
    Xu, Yuan
    PROCEEDINGS OF 2018 IEEE 7TH DATA DRIVEN CONTROL AND LEARNING SYSTEMS CONFERENCE (DDCLS), 2018, : 954 - 958