Analysis of complexity indices for classification problems: Cancer gene expression data

被引:41
|
作者
Lorena, Ana C.
Costa, Ivan G. [1 ]
Spolaor, Newton
de Souto, Marcilio C. P. [1 ]
机构
[1] Univ Fed Pernambuco, Ctr Informat, Recife, PE, Brazil
关键词
Classification; Gene expression data; Complexity indices; Linear separability; BREAST-CANCER; MICROARRAY; SENSITIVITY; PREDICTION; ALGORITHMS; SELECTION; RANKING;
D O I
10.1016/j.neucom.2011.03.054
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Currently, cancer diagnosis at a molecular level has been made possible through the analysis of gene expression data. More specifically, one usually uses machine learning (ML) techniques to build, from cancer gene expression data, automatic diagnosis models (classifiers). Cancer gene expression data often present some characteristics that can have a negative impact in the generalization ability of the classifiers generated. Some of these properties are data sparsity and an unbalanced class distribution. We investigate the results of a set of indices able to extract the intrinsic complexity information from the data. Such measures can be used to analyze, among other things, which particular characteristics of cancer gene expression data mostly impact the prediction ability of support vector machine classifiers. In this context, we also show that, by applying a proper feature selection procedure to the data, one can reduce the influence of those characteristics in the error rates of the classifiers induced. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:33 / 42
页数:10
相关论文
共 50 条
  • [31] Selecting significant genes by randomization test for cancer classification using gene expression data
    Mao, Zhiyi
    Cai, Wensheng
    Shao, Xueguang
    JOURNAL OF BIOMEDICAL INFORMATICS, 2013, 46 (04) : 594 - 601
  • [32] A Survey on Hybrid Feature Selection Methods in Microarray Gene Expression Data for Cancer Classification
    Almugren, Nada
    Alshamlan, Hala
    IEEE ACCESS, 2019, 7 : 78533 - 78548
  • [33] Rider-chicken optimization dependent recurrent neural network for cancer detection and classification using gene expression data
    Aher, Chetan Nimba
    Jena, Ajay Kumar
    COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING-IMAGING AND VISUALIZATION, 2021, 9 (02) : 174 - 191
  • [34] Relative evolutionary hierarchical analysis for gene expression data classification
    Czajkowski, Marcin
    Kretowski, Marek
    PROCEEDINGS OF THE 2019 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'19), 2019, : 1156 - 1164
  • [35] Generalized discriminant analysis for tumor classification with gene expression data
    Yang, Wen-Hui
    Dai, Dao-Qing
    Yan, Hong
    PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 4322 - +
  • [36] Analysis of data complexity measures for classification
    Cano, Jose-Ramon
    EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (12) : 4820 - 4831
  • [37] Comparative Analysis of Discretization Methods for Gene Selection of Breast Cancer Gene Expression Data
    Sathishkumar, E. N.
    Thangavel, K.
    Nishama, A.
    COMPUTATIONAL INTELLIGENCE, CYBER SECURITY AND COMPUTATIONAL MODELS, 2014, 246 : 373 - 378
  • [38] Exploratory Analysis of Gene Expression Data Using Biplot
    Park, Mira
    KOREAN JOURNAL OF APPLIED STATISTICS, 2005, 18 (02) : 355 - 369
  • [39] Dynamic association rules for gene expression data analysis
    Chen, Shu-Chuan
    Tsai, Tsung-Hsien
    Chung, Cheng-Han
    Li, Wen-Hsiung
    BMC GENOMICS, 2015, 16
  • [40] Bidirectional compressive sensing for classification of gene expression data
    Xu, Xiaohua
    Fan, Baichuan
    He, Ping
    Liang, Yali
    Ding, Jie
    Lou, Yuan
    Zhang, Zhijun
    Chang, Xincheng
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (15)