Microarray Missing Values Imputation Methods: Critical Analysis Review

被引:7
作者
Hourani, Mou'ath [1 ]
El Emary, Ibrahiem M. M. [2 ]
机构
[1] Al Ahliyya Amman Univ, Fac Informat Technol, Amman 19328, Jordan
[2] Al Ahliyya Amman Univ, Fac Engn, Amman 19328, Jordan
关键词
Completely at random (MCAR); Missing At Random (MAR); Sequential K-Nearest Neighbors (SKNN); Gene Ontology (GO); Singular Value Decomposition (SVD); Least Squares Imputation (LSI); Local Least Square Imputation (LLSI); Bayesian Principal Component Analysis (BPCA) and Fixed Rank Approximation Method (FRAA); GENE-EXPRESSION;
D O I
10.2298/csis0902165H
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Gene expression data often contain missing expression values. For the purpose of conducting an effective clustering analysis and since many algorithms for gene expression data analysis require a complete matrix of gene array values, choosing the most effective missing value estimation method is necessary. In this paper, the most commonly used imputation methods from literature are critically reviewed and analyzed to explain the proper use, weakness and point the observations on each published method. From the conducted analysis, we conclude that the Local Least Square (LLS) and Support Vector Regression (SVR) algorithms have achieved the best performances. SVR can be considered as a complement algorithm for LLS especially when applied to noisy data. However, both algorithms suffer from some deficiencies presented in choosing the value of Number of Selected Genes (K) and the appropriate kernel function. To overcome these drawbacks, the need for new method that automatically chooses the parameters of the function and it also has an appropriate computational complexity is imperative.
引用
收藏
页码:165 / 190
页数:26
相关论文
共 28 条
[1]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[2]  
ALTER O, 2000, P NATL ACAD SCI PNAS, V97
[3]  
[Anonymous], 2006, MULTIVARIATE DATA AN
[4]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[5]  
Baldwin DN, 2003, GENOME BIOL, V4
[6]   LSimpute: accurate estimation of missing values in microarray data with least squares methods [J].
Bo, TH ;
Dysvik, J ;
Jonassen, I .
NUCLEIC ACIDS RESEARCH, 2004, 32 (03) :e34
[7]  
Brevern A., 2004, BMC BIOINFORMATICS, V5
[8]   High-throughput imaging of brain gene expression [J].
Brown, VM ;
Ossadtchi, A ;
Khan, AH ;
Cherry, SR ;
Leahy, RM ;
Smith, DJ .
GENOME RESEARCH, 2002, 12 (02) :244-254
[9]   Novel knowledge-based mean force potential at the profile level [J].
Dong, Qiwen ;
Wang, Xiaolong ;
Lin, Lei .
BMC BIOINFORMATICS, 2006, 7 (1)
[10]  
FRIEDLAND S, 2005, I MATH ITS APPL, V1948