Missing value imputation strategies for metabolomics data

被引:122
作者
Grace Armitage, Emily [1 ]
Godzien, Joanna [1 ]
Alonso-Herranz, Vanesa [1 ]
Lopez-Gonzalvez, Angeles [1 ]
Barbas, Coral [1 ]
机构
[1] Univ CEU San Pablo, Fac Farm, Ctr Metabol & Bioanal CEMBIO, Madrid 28668, Spain
关键词
CE-MS; Data; False-discovery rate; Imputation; k-nearest neighbour; Metabolomics; Missing values;
D O I
10.1002/elps.201500352
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The origin of missing values can be caused by different reasons and depending on these origins missing values should be considered differently and dealt with in different ways. In this research, four methods of imputation have been compared with respect to revealing their effects on the normality and variance of data, on statistical significance and on the approximation of a suitable threshold to accept missing data as truly missing. Additionally, the effects of different strategies for controlling familywise error rate or false discovery and how they work with the different strategies for missing value imputation have been evaluated. Missing values were found to affect normality and variance of data and k-means nearest neighbour imputation was the best method tested for restoring this. Bonferroni correction was the best method for maximizing true positives and minimizing false positives and it was observed that as low as 40% missing data could be truly missing. The range between 40 and 70% missing values was defined as a "gray area" and therefore a strategy has been proposed that provides a balance between the optimal imputation strategy that was k-means nearest neighbor and the best approximation of positioning real zeros.
引用
收藏
页码:3050 / 3060
页数:11
相关论文
共 23 条
[1]   Large-scale human metabolomics studies: A strategy for data (pre-) processing and validation [J].
Bijlsma, S ;
Bobeldijk, L ;
Verheij, ER ;
Ramaker, R ;
Kochhar, S ;
Macdonald, IA ;
van Ommen, B ;
Smilde, AK .
ANALYTICAL CHEMISTRY, 2006, 78 (02) :567-574
[2]   The human circadian metabolome [J].
Dallmann, Robert ;
Viola, Antoine U. ;
Tarokh, Leila ;
Cajochen, Christian ;
Brown, Steven A. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2012, 109 (07) :2625-2629
[3]   Mass spectrometry-based metabolic profiling reveals different metabolite patterns in invasive ovarian carcinomas and ovarian borderline tumors [J].
Denkert, Carsten ;
Budczies, Jan ;
Kind, Tobias ;
Weichert, Wilko ;
Tablack, Peter ;
Sehouli, Jalid ;
Niesporek, Silvia ;
Koensgen, Dorninique ;
Dietel, Manfred ;
Fiehn, Oliver .
CANCER RESEARCH, 2006, 66 (22) :10795-10804
[4]   Controlling the quality of metabolomics data: new strategies to get the best out of the QC sample [J].
Godzien, Joanna ;
Alonso-Herranz, Vanesa ;
Barbas, Coral ;
Grace Armitage, Emily .
METABOLOMICS, 2015, 11 (03) :518-528
[5]   Influence of Missing Values Substitutes on Multivariate Analysis of Metabolomics Data [J].
Gromski, Piotr S. ;
Xu, Yun ;
Kotze, Helen L. ;
Correa, Elon ;
Ellis, David I. ;
Armitage, Emily Grace ;
Turner, Michael L. ;
Goodacre, Royston .
METABOLITES, 2014, 4 (02) :433-452
[6]   Counting Missing Values in a Metabolite-Intensity Data Set for Measuring the Analytical Performance of a Metabolomics Platform [J].
Huan, Tao ;
Li, Liang .
ANALYTICAL CHEMISTRY, 2015, 87 (02) :1306-1313
[7]   Separating the wheat from the chaff: a prioritisation pipeline for the analysis of metabolomics datasets [J].
Jankevics, Andris ;
Merlo, Maria Elena ;
de Vries, Marcel ;
Vonk, Roel J. ;
Takano, Eriko ;
Breitling, Rainer .
METABOLOMICS, 2012, 8 (01) :S29-S36
[8]   Analysis of longitudinal metabolomics data [J].
Jansen, JJ ;
Hoefsloot, HCJ ;
Boelens, HFM ;
van der Greef, J ;
Smilde, AK .
BIOINFORMATICS, 2004, 20 (15) :2438-2446
[9]   Direct infusion mass spectrometry metabolomics dataset: a benchmark for data processing and quality control [J].
Kirwan, Jennifer A. ;
Weber, Ralf J. M. ;
Broadhurst, David I. ;
Viant, Mark R. .
SCIENTIFIC DATA, 2014, 1
[10]   Semi-automated non-target processing in GC x GC-MS metabolomics analysis: applicability for biomedical studies [J].
Koek, Maud M. ;
van der Kloet, Frans M. ;
Kleemann, Robert ;
Kooistra, Teake ;
Verheij, Elwin R. ;
Hankemeier, Thomas .
METABOLOMICS, 2011, 7 (01) :1-14