Classification of Multi-class Microarray Cancer Data Using Ensemble Learning Method

被引：2

作者：

Shekar, B. H. ^{[1
]}

Dagnew, Guesh ^{[1
]}

机构：

[1] Mangalore Univ, Dept Comp Sci, Mangalore, Karnataka, India

来源：

DATA ANALYTICS AND LEARNING | 2019年 / 43卷

关键词：

Feature selection; Dimensionality reduction; Ensemble learning; Microarray cancer data classifier; FEATURE-SELECTION; GENE SELECTION;

D O I：

10.1007/978-981-13-2514-4_24

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Nowadays, microarray cancer analysis is one of the top research areas in the field of machine learning, computational biology, and pattern recognition. Classifying cancer data into their respective class and its analysis plays a key role in diagnosis, identifying negative and positive cases as well as treatment in the case of binary classes. In the case of multi-class classification, the aim is to identify the type of cancer. The main challenge in microarray cancer datasets is the curse of dimensionality and lack of sufficient sample data. To overcome this problem, feature selection and dimensionality reduction are explored in identifying relevant features. In this work, we propose an ensemble learning method for multi-class cancer data classification. The Information Gain (IG) is used for feature selection which works by ranking attributes according to their relevance with respect to the class label. Three classifiers are used, namely k-Nearest Neighbor, Logistic Regression, and Random Forest. tenfold cross validation is applied to train and test the model. Experiments are conducted on the standard multi-class cancer datasets, namely Leukemia 3 class, Leukemia 4 class, Harvard Lung cancer 5 class, and MLL 3 class. To evaluate the performance of the model, various performance measures such as Classification Accuracy, F1-measure, and Area Under the Curve (AUC) are used. Confusion matrix is used to show whether or not samples are correctly classified. Comparison of each classifier's performance is presented on the basis of performance evaluation criteria. Significant performance improvement is observed in the results due to feature selection for three of the classifiers with the exception of random forest's performance on MLL Leukemia whose result is found to be good on the original dataset compared to the selected features. For the rest of the datasets, all classifiers registered better result due to feature selection.

引用

页码：279 / 292

页数：14

共 50 条

[1] A Hierarchical Ensemble of ECOC for cancer classification based on multi-class microarray data
Liu, Kun-Hong
Zeng, Zhi-Hao
Ng, Vincent To Yee
INFORMATION SCIENCES, 2016, 349 : 102 - 118
[2] Multi-Class Breast Cancer Classification Using Ensemble of Pretrained models and Transfer Learning
Rao, Perumalla Murali Mallikarjuna
Singh, Sanjay Kumar
Khamparia, Aditya
Bhushan, Bharat
Podder, Prajoy
CURRENT MEDICAL IMAGING, 2022, 18 (04) : 409 - 416
[3] An Effective Ensemble Method for Multi-class Classification and Regression for Imbalanced Data
Alam, Tahira
Ahmed, Chowdhury Farhan
Zahin, Sabit Anwar
Khan, Muhammad Asif Hossain
Islam, Maliha Tashfia
ADVANCES IN DATA MINING: APPLICATIONS AND THEORETICAL ASPECTS (ICDM 2018), 2018, 10933 : 59 - 74
[4] Multi-Class Text Classification on Khmer News Using Ensemble Method in Machine Learning Algorithms
Phann, Raksmey
Soomlek, Chitsutha
Seresangtakul, Pusadee
ACTA INFORMATICA PRAGENSIA, 2023, 12 (02) : 243 - 259
[5] A Study on Multi-class Classification of Breast Cancer Images using Ensemble Network and Transfer Learning
Tipirneni L.
Patan R.
Recent Patents on Engineering, 2021, 15 (06)
[6] Multi-class cancer classification by total principal component regression (TPCR) using microarray gene expression data
Tan, YX
Shi, LM
Tong, WD
Wang, C
NUCLEIC ACIDS RESEARCH, 2005, 33 (01) : 56 - 65
[7] Multi-TGDR: A Regularization Method for Multi-Class Classification in Microarray Experiments
Tian, Suyan
Suarez-Farinas, Mayte
PLOS ONE, 2013, 8 (11):
[8] Multi-class classification of breast cancer abnormality using transfer learning
Rani, Neha
Gupta, Deepak Kumar
Singh, Samayveer
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (30) : 75085 - 75100
[9] Multi-class Ensemble Learning of Imbalanced Bidding Fraud Data
Anowar, Farzana
Sadaoui, Samira
ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, 11489 : 352 - 358
[10] Multi-class WHMBoost: An ensemble algorithm for multi-class imbalanced data
Zhao, Jiakun
Jin, Ju
Zhang, Yibo
Zhang, Ruifeng
Chen, Si
INTELLIGENT DATA ANALYSIS, 2022, 26 (03) : 599 - 614

← 1 2 3 4 5 →