Improving missing value imputation of microarray data by using spot quality weights

被引:17
作者
Johansson, Peter [1 ]
Hakkinen, Jari [1 ]
机构
[1] Lund Univ, Dept Theoret Phys, SE-22362 Lund, Sweden
关键词
D O I
10.1186/1471-2105-7-306
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Microarray technology has become popular for gene expression profiling, and many analysis tools have been developed for data interpretation. Most of these tools require complete data, but measurement values are often missing A way to overcome the problem of incomplete data is to impute the missing data before analysis. Many imputation methods have been suggested, some naive and other more sophisticated taking into account correlation in data. However, these methods are binary in the sense that each spot is considered either missing or present. Hence, they are depending on a cutoff separating poor spots from good spots. We suggest a different approach in which a continuous spot quality weight is built into the imputation methods, allowing for smooth imputations of all spots to larger or lesser degree. Results: We assessed several imputation methods on three data sets containing replicate measurements, and found that weighted methods performed better than non-weighted methods. Of the compared methods, best performance and robustness were achieved with the weighted nearest neighbours method (WeNNI), in which both spot quality and correlations between genes were included in the imputation. Conclusion: Including a measure of spot quality improves the accuracy of the missing value imputation. WeNNI, the proposed method is more accurate and less sensitive to parameters than the widely used kNNimpute and LSimpute algorithms.
引用
收藏
页数:10
相关论文
共 29 条
[1]   Gene expression profiling of leukemic cell lines reveals conserved molecular signatures among subtypes with specific genetic aberrations [J].
Andersson, A ;
Edén, P ;
Lindgren, D ;
Nilsson, J ;
Lassen, C ;
Heldrup, J ;
Fontes, M ;
Borg, Å ;
Mitelman, F ;
Johansson, B ;
Höglund, M ;
Fioretos, T .
LEUKEMIA, 2005, 19 (06) :1042-1050
[2]   Molecular signatures in childhood acute leukemia and their correlations to expression patterns in normal hematopoietic subpopulations [J].
Andersson, A ;
Olofsson, T ;
Lindgren, D ;
Nilsson, B ;
Ritz, C ;
Edén, P ;
Lassen, C ;
Råde, J ;
Fontes, M ;
Morse, H ;
Heldrup, J ;
Behrendtz, M ;
Mitelman, F ;
Höglund, M ;
Johansson, B ;
Fioretos, T .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (52) :19069-19074
[3]   LSimpute: accurate estimation of missing values in microarray data with least squares methods [J].
Bo, TH ;
Dysvik, J ;
Jonassen, I .
NUCLEIC ACIDS RESEARCH, 2004, 32 (03) :e34
[4]   MASQOT:: a method for cDNA microarray spot quality control -: art. no. 250 [J].
Bylesjö, M ;
Eriksson, D ;
Sjödin, A ;
Sjöström, M ;
Jansson, S ;
Antti, H ;
Trygg, J .
BMC BIOINFORMATICS, 2005, 6 (1)
[5]   Ratio statistics of gene expression levels and applications to microarray data analysis [J].
Chen, YD ;
Kamat, V ;
Dougherty, ER ;
Bittner, ML ;
Meltzer, PS ;
Trent, JM .
BIOINFORMATICS, 2002, 18 (09) :1207-1215
[6]  
DeRisi J, 1996, NAT GENET, V14, P457
[7]   Cluster analysis and display of genome-wide expression patterns [J].
Eisen, MB ;
Spellman, PT ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) :14863-14868
[8]  
FEREBRO J, 2006, INT J CANCER, V118, P1165
[9]   Intratumor versus intertumor heterogeneity in gene expression profiles of soft-tissue sarcomas [J].
Francis, P ;
Fernebro, J ;
Edén, P ;
Laurell, A ;
Rydholm, A ;
Domanski, HA ;
Breslin, T ;
Hegardt, C ;
Borg, A ;
Nilbert, M .
GENES CHROMOSOMES & CANCER, 2005, 43 (03) :302-308
[10]   Support vector machine classification and validation of cancer tissue samples using microarray expression data [J].
Furey, TS ;
Cristianini, N ;
Duffy, N ;
Bednarski, DW ;
Schummer, M ;
Haussler, D .
BIOINFORMATICS, 2000, 16 (10) :906-914