Methods to Transform Microarray Data for Cancer Prediction

被引:0
作者
Pattanateepapon, Anuchate [1 ]
Suwansantisuk, Watcharapan [1 ]
Kumhom, Pinit [1 ]
机构
[1] King Mongkuts Univ Technol, Fac Engn, Dept Elect & Telecommun Engn, Thonburi, Thailand
来源
2016 IEEE CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY (CIBCB) | 2016年
关键词
feature selection; support vector machine; gene-expression microarray data; data transform; DIMENSION REDUCTION; GENE; TUMOR;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cancer classification based on microarray data has gained attention in recent years from the bioinformatics community, due to a high death toll of cancer and the significance of early diagnosis. Among the many steps in cancer classification, one that is underexplored and can significantly affect the classification performance is data transformation. We develop two transformation techniques, called the unity-based normalization with min-max interval (UBMI) and the standard score with trimmed mean (SSTM), and compare them with existing techniques in terms of accuracy, sensitivity and specificity. The results show that our proposed methods outperform the methods tested. Overall, for example, the SSTM achieves the highest values of accuracy, sensitivity, and specificity in 73 out of 138 cases. The UBMI is the runner up, with 51 winning cases. This advantage confirms the ability of the UBMI and SSTM to accentuate the difference between samples of distinct classes, and highlights the importance of data transformation, a step that otherwise is usually overlooked.
引用
收藏
页数:7
相关论文
共 25 条
[1]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[2]  
[Anonymous], WILEY SERIES PROBABI
[3]   A review of microarray datasets and applied feature selection methods [J].
Bolon-Canedo, V. ;
Sanchez-Marono, N. ;
Alonso-Betanzos, A. ;
Benitez, J. M. ;
Herrera, F. .
INFORMATION SCIENCES, 2014, 282 :111-135
[4]  
DasGupta A., 2008, SPRINGER TEXT STAT
[5]   Bayesian biclustering of gene expression data [J].
Gu, Jiajun ;
Liu, Jun S. .
BMC GENOMICS, 2008, 9 (Suppl 1)
[6]   Gene selection for cancer classification using support vector machines [J].
Guyon, I ;
Weston, J ;
Barnhill, S ;
Vapnik, V .
MACHINE LEARNING, 2002, 46 (1-3) :389-422
[7]   Robust twin boosting for feature selection from high-dimensional omics data with label noise [J].
He, Shan ;
Chen, Huanhuan ;
Zhu, Zexuan ;
Ward, Douglas G. ;
Cooper, Helen J. ;
Viant, Mark R. ;
Heath, John K. ;
Yao, Xin .
INFORMATION SCIENCES, 2015, 291 :1-18
[8]   A Faster cDNA Microarray Gene Expression Data Classifier for Diagnosing Diseases [J].
Hsieh, Sun-Yuan ;
Chou, Yu-Chun .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2016, 13 (01) :43-54
[9]   Parallelized Evolutionary Learning for Detection of Biclusters in Gene Expression Data [J].
Huang, Qinghua ;
Tao, Dacheng ;
Li, Xuelong ;
Liew, Alan Wee-Chung .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2012, 9 (02) :560-570
[10]  
Imbeaud S., 2005, DRUG DISCOVERY TODAY, V10