Analysis of complexity indices for classification problems: Cancer gene expression data

被引:41
|
作者
Lorena, Ana C.
Costa, Ivan G. [1 ]
Spolaor, Newton
de Souto, Marcilio C. P. [1 ]
机构
[1] Univ Fed Pernambuco, Ctr Informat, Recife, PE, Brazil
关键词
Classification; Gene expression data; Complexity indices; Linear separability; BREAST-CANCER; MICROARRAY; SENSITIVITY; PREDICTION; ALGORITHMS; SELECTION; RANKING;
D O I
10.1016/j.neucom.2011.03.054
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Currently, cancer diagnosis at a molecular level has been made possible through the analysis of gene expression data. More specifically, one usually uses machine learning (ML) techniques to build, from cancer gene expression data, automatic diagnosis models (classifiers). Cancer gene expression data often present some characteristics that can have a negative impact in the generalization ability of the classifiers generated. Some of these properties are data sparsity and an unbalanced class distribution. We investigate the results of a set of indices able to extract the intrinsic complexity information from the data. Such measures can be used to analyze, among other things, which particular characteristics of cancer gene expression data mostly impact the prediction ability of support vector machine classifiers. In this context, we also show that, by applying a proper feature selection procedure to the data, one can reduce the influence of those characteristics in the error rates of the classifiers induced. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:33 / 42
页数:10
相关论文
共 50 条
  • [1] Decision forest for classification of gene expression data
    Huang, Jianping
    Fang, Hong
    Fan, Xiaohui
    COMPUTERS IN BIOLOGY AND MEDICINE, 2010, 40 (08) : 698 - 704
  • [2] A survey on gene expression data analysis using deep learning methods for cancer diagnosis
    Ravindran, U.
    Gunavathi, C.
    PROGRESS IN BIOPHYSICS & MOLECULAR BIOLOGY, 2023, 177 : 1 - 13
  • [3] Using Supervised Complexity Measures in the Analysis of Cancer Gene Expression Data Sets
    Costa, Ivan G.
    Lorena, Ana C.
    Peres, Liciana R. M. P. y
    de Souto, Marcilio C. P.
    ADVANCES IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, PROCEEDINGS, 2009, 5676 : 48 - +
  • [4] Risk classification of cancer survival using ANN with gene expression data from multiple laboratories
    Chen, Yen-Chen
    Ke, Wan-Chi
    Chiu, Hung-Wen
    COMPUTERS IN BIOLOGY AND MEDICINE, 2014, 48 : 1 - 7
  • [5] A genetic filter for cancer classification on gene expression data
    Kim, Yong-Hyuk
    Yoon, Yourim
    BIO-MEDICAL MATERIALS AND ENGINEERING, 2015, 26 : S1993 - S2002
  • [6] Feature Selection and Classification in gene expression cancer data
    Pavithra, D.
    Lakshmanan, B.
    2017 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN DATA SCIENCE (ICCIDS), 2017,
  • [7] Comparison of linear discriminant analysis methods for the classification of cancer based on gene expression data
    Huang, Desheng
    Quan, Yu
    He, Miao
    Zhou, Baosen
    JOURNAL OF EXPERIMENTAL & CLINICAL CANCER RESEARCH, 2009, 28
  • [8] Cancer classification using gene expression data
    Lu, Y
    Han, JW
    INFORMATION SYSTEMS, 2003, 28 (04) : 243 - 268
  • [9] Cancer Classification Using Gene Expression Data
    Sonsare, Pravinkumar
    Mujumdar, Aarya
    Joshi, Pranjali
    Morayya, Nipun
    Hablani, Sachal
    Khergade, Vedant
    SMART TRENDS IN COMPUTING AND COMMUNICATIONS, VOL 1, SMARTCOM 2024, 2024, 945 : 1 - 11
  • [10] Gene expression studies with DGL global optimization for the molecular classification of cancer
    Li, Dongguang
    SOFT COMPUTING, 2011, 15 (01) : 111 - 129