Usage of Clustering and Weighted Nearest Neighbors for Efficient Missing Data Imputation of Microarray Gene Expression Dataset

被引:0
作者
Dubey, Aditya [1 ]
Rasool, Akhtar [1 ]
机构
[1] Maulana Azad Natl Inst Technol, Dept Comp Sci & Engn, Bhopal 462003, India
关键词
clustering; imputation; missing completely at random; mutual KNN; univariate; PREDICTION; CANCER;
D O I
10.1002/adts.202200460
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
A complete dataset is essential for most bioinformatics analytical techniques, including gene expression data categorization, prognosis, and prediction. Due to sensor malfunction, software inability, or human error, the gene sample value may be missing. In gene expression experiments, missing data has a massive effect on analyzing the data obtained. Consequently, this has become a crucial issue requiring an efficient imputation technique to address. This research provided a technique for predicting missing values by using clustering and top K closest neighbor techniques that consider the local similarity pattern. The K-means method is integrated with a spectral clustering methodology. After optimizing the clustering parameters, cluster size, and weighting criteria, missed gene sample values are estimated. The top K closest neighbor method uses weighted distance to predict the missed gene sample value falling in a specific cluster. Experimental outcomes show that the suggested imputation methodology generates efficient predictions compared to existing imputation techniques. In this research, microarray datasets comprising information from various cancers and tumors are used to experiment with the imputation performance. The primary contribution of this work is that even if the microarray dataset has varied dimensions and features, local similarity-based approaches may be employed for missing value prediction.
引用
收藏
页数:12
相关论文
共 39 条
[1]   CLC and IFNAR1 are differentially expressed and a global immunity score is distinct between early- and late-onset colorectal cancer [J].
Agesen, T. H. ;
Berg, M. ;
Clancy, T. ;
Thiis-Evensen, E. ;
Cekaite, L. ;
Lind, G. E. ;
Nesland, J. M. ;
Bakka, A. ;
Mala, T. ;
Hauss, H. J. ;
Fetveit, T. ;
Vatn, M. H. ;
Hovig, E. ;
Nesbakken, A. ;
Lothe, R. A. ;
Skotheim, R. I. .
GENES AND IMMUNITY, 2011, 12 (08) :653-662
[2]  
Aydilek I.B., 2012, INT J INNOV COMPUT I, V7, P1349
[3]   Improving cluster-based missing value estimation of DNA microarray data [J].
Bras, Ligia P. ;
Menezes, Jose C. .
BIOMOLECULAR ENGINEERING, 2007, 24 (02) :273-282
[4]  
Burk I., 2012, THESIS U STUTTGART G
[5]   Root mean square error (RMSE) or mean absolute error (MAE)? - Arguments against avoiding RMSE in the literature [J].
Chai, T. ;
Draxler, R. R. .
GEOSCIENTIFIC MODEL DEVELOPMENT, 2014, 7 (03) :1247-1250
[6]  
Chan H.L.Y., 2011, IEEE ACM T COMPUT BI, V8, P557
[7]   Down-regulation of the interferon signaling pathway in T lymphocytes from patients with metastatic melanoma [J].
Critchley-Thorne, Rebecca J. ;
Yan, Ning ;
Nacu, Serban ;
Weber, Jeffrey ;
Holmes, Susan P. ;
Lee, Peter P. .
PLOS MEDICINE, 2007, 4 (05) :897-911
[8]  
Dubey Aditya, 2019, 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), P483, DOI 10.1109/I-SMAC47947.2019.9032631
[9]   Efficient technique of microarray missing data imputation using clustering and weighted nearest neighbour [J].
Dubey, Aditya ;
Rasool, Akhtar .
SCIENTIFIC REPORTS, 2021, 11 (01)
[10]  
Dubey A, 2020, INT J ADV COMPUT SC, V11, P710