ESTIMATION OF MISSING VALUES USING OPTIMISED HYBRID FUZZY C-MEANS AND MAJORITY VOTE FOR MICROARRAY DATA

被引:0
|
作者
Kumaran, Shamini Raja [1 ]
Othman, Mohd Shahizan [1 ]
Yusuf, Lizawati Mi [1 ]
机构
[1] Univ Teknol Malaysia, Sch Comp, Skudai, Johor, Malaysia
来源
JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGY-MALAYSIA | 2020年 / 19卷 / 04期
关键词
Fuzzy C-means; majority vote; missing values; microarray data; data optimisation; IMPUTATION; ALGORITHM;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Missing values are a huge constraint in microarray technologies towards improving and identifying disease-causing genes. Estimating missing values is an undeniable scenario faced by field experts. The imputation method is an effective way to impute the proper values to proceed with the next process in microarray technology. Missing value imputation methods may increase the classification accuracy. Although these methods might predict the values, classification accuracy rates prove the ability of the methods to identify the missing values in gene expression data. In this study, a novel method, Optimised Hybrid of Fuzzy C-Means and Majority Vote (opt-FCMMV), was proposed to identify the missing values in the data. Using the Majority Vote (MV) and optimisation through Particle Swann Optimisation (PSO), this study predicted missing values in the data to form more informative and solid data. In order to verify the effectiveness of opt-FCMMV, several experiments were carried out on two publicly available microarray datasets (i.e. Ovary and Lung Cancer) under three missing value mechanisms with five different percentage values in the biomedical domain using Support Vector Machine (SVM) classifier. The experimental results showed that the proposed method functioned efficiently by showcasing the highest accuracy rate as compared to the one without imputations, with imputation by Fuzzy C-Means (FCM), and imputation by Fuzzy C-Means with Majority Vote (FCMMV). For example, the accuracy rates for Ovary Cancer data with 5% missing values were 64.0% for no imputation, 81.8% (FCM), 90.0% (FCMMV), and 93.7% (opt-FCMMV). Such an outcome indicates that the opt-FCMMV may also be applied in different domains in order to prepare the dataset for various data mining tasks.
引用
收藏
页码:459 / 482
页数:24
相关论文
共 50 条
  • [41] Simplification Method Using K-NN Estimation and Fuzzy C-Means Clustering Algorithm
    Mahdaoui, Abdelaaziz
    Bouazi, A.
    Marhraoui, A. Hsaini
    Sbai, E. H.
    INTELLIGENT COMPUTING, VOL 1, 2019, 858 : 305 - 318
  • [42] Bias Field Estimation and Segmentation of MRI Images using a Spatial Fuzzy C-means Algorithm
    Adhikari, Sudip Kumar
    Sing, Jamuna Kanta
    Basu, Dipak Kumar
    2016 2ND INTERNATIONAL CONFERENCE ON CONTROL, INSTRUMENTATION, ENERGY & COMMUNICATION (CIEC), 2016, : 158 - 162
  • [43] An Innovate Hybrid Approach for Residence Price Using Fuzzy C-Means and Machine Learning Techniques
    Papaleonidas, Antonios
    Lykostratis, Konstantinos
    Psathas, Anastasios Panagiotis
    Iliadis, Lazaros
    Giannopoulou, Maria
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT IV, 2022, 13532 : 346 - 357
  • [44] Fuzzy C-Means Clustering for Motion Capture Tennis Time-Series Data
    Skublewska-Paszkowska, Maria
    Powroznik, Pawel
    Karczmarek, Pawel
    Lukasik, Edyta
    Smolka, Jakub
    IEEE ACCESS, 2024, 12 : 150975 - 150996
  • [45] User based Collaborative Filtering using fuzzy C-means
    Koohi, Hamidreza
    Kiani, Kourosh
    MEASUREMENT, 2016, 91 : 134 - 139
  • [46] Cancer Classification using Fuzzy C-Means with Feature Selection
    Rachman, Arvan Aulia
    Rustam, Zuherman
    2016 12TH INTERNATIONAL CONFERENCE ON MATHEMATICS, STATISTICS, AND THEIR APPLICATIONS (ICMSA), 2016, : 31 - 34
  • [47] Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model
    Sefidian, Amir Masoud
    Daneshpour, Negin
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 115 : 68 - 94
  • [48] Performance of the K-means and fuzzy C-means algorithms in big data analytics
    Salman Z.
    Alomary A.
    International Journal of Information Technology, 2024, 16 (1) : 465 - 470
  • [49] Optimizing of Fuzzy C-Means Clustering Algorithm Using GA
    Alata, Mohanad
    Molhim, Mohammad
    Ramini, Abdullah
    PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 29, 2008, 29 : 224 - 229
  • [50] Effectiveness of Different Partition Based Clustering Algorithms for Estimation of Missing Values in Microarray Gene Expression Data
    Bose, Shilpi
    Das, Chandra
    Chakraborty, Abirlal
    Chattopadhyay, Samiran
    ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY, VOL 2, 2013, 177 : 37 - +