Comparison of imputation and imputation-free methods for statistical analysis of mass spectrometry data with missing data

被引:11
作者
Taylor, Sandra [1 ]
Ponzini, Matthew [1 ]
Wilson, Machelle [1 ]
Kim, Kyoungmi [1 ]
机构
[1] Univ Calif Davis, Sch Med, Div Biostat, Davis, CA 95616 USA
关键词
metabolomics; mass spectrometry; missing data; imputation; sample size; EXPRESSION;
D O I
10.1093/bib/bbab353
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Missing values are common in high-throughput mass spectrometry data. Two strategies are available to address missing values: (i) eliminate or impute the missing values and apply statistical methods that require complete data and (ii) use statistical methods that specifically account for missing values without imputation (imputation-free methods). This study reviews the effect of sample size and percentage of missing values on statistical inference for multiple methods under these two strategies. With increasing missingness, the ability of imputation and imputation-free methods to identify differentially and non-differentially regulated compounds in a two-group comparison study declined. Random forest and k-nearest neighbor imputation combined with a Wilcoxon test performed well in statistical testing for up to 50% missingness with little bias in estimating the effect size. Quantile regression imputation accompanied with a Wilcoxon test also had good statistical testing outcomes but substantially distorted the difference in means between groups. None of the imputation-free methods performed consistently better for statistical testing than imputation methods.
引用
收藏
页数:11
相关论文
共 25 条
[1]   Missing value imputation for microarray data: a comprehensive comparison study and a web tool [J].
Chiu, Chia-Chun ;
Chan, Shih-Yao ;
Wang, Chung-Ching ;
Wu, Wei-Sheng .
BMC SYSTEMS BIOLOGY, 2013, 7
[2]   Protein Quantification in Label-Free LC-MS Experiments [J].
Clough, Timothy ;
Key, Melissa ;
Ott, Ilka ;
Ragg, Susanne ;
Schadow, Gunther ;
Vitek, Olga .
JOURNAL OF PROTEOME RESEARCH, 2009, 8 (11) :5275-5284
[3]   Missing values in mass spectrometry based metabolomics: an undervalued step in the data processing pipeline [J].
Hrydziuszko, Olga ;
Viant, Mark R. .
METABOLOMICS, 2012, 8 (01) :S161-S174
[4]   Differential Abundance Analysis with Bayes Shrinkage Estimation of Variance (DASEV) for Zero-Inflated Proteomic and Metabolomic Data [J].
Huang, Zhengyan ;
Lane, Andrew N. ;
Fan, Teresa W-M ;
Higashi, Richard M. ;
Weiss, Heidi L. ;
Yin, Xiangrong ;
Wang, Chi .
SCIENTIFIC REPORTS, 2020, 10 (01)
[5]   A statistical framework for protein quantitation in bottom-up MS-based proteomics [J].
Karpievitch, Yuliya ;
Stanley, Jeff ;
Taverner, Thomas ;
Huang, Jianhua ;
Adkins, Joshua N. ;
Ansong, Charles ;
Heffron, Fred ;
Metz, Thomas O. ;
Qian, Wei-Jun ;
Yoon, Hyunjin ;
Smith, Richard D. ;
Dabney, Alan R. .
BIOINFORMATICS, 2009, 25 (16) :2028-2034
[6]   Characterization of missing values in untargeted MS-based metabolomics data and evaluation of missing data handling strategies [J].
Kieu Trinh Do ;
Wahl, Simone ;
Raffler, Johannes ;
Molnos, Sophie ;
Laimighofer, Michael ;
Adamski, Jerzy ;
Suhre, Karsten ;
Strauch, Konstantin ;
Peters, Annette ;
Gieger, Christian ;
Langenberg, Claudia ;
Stewart, Isobel D. ;
Theis, Fabian J. ;
Grallert, Harald ;
Kastenmueller, Gabi ;
Krumsiek, Jan .
METABOLOMICS, 2018, 14 (10)
[7]   Plasma metabolites and lipids associate with kidney function and kidney volume in hypertensive ADPKD patients early in the disease course [J].
Kim, Kyoungmi ;
Trott, Josephine F. ;
Gao, Guimin ;
Chapman, Arlene ;
Weiss, Robert H. .
BMC NEPHROLOGY, 2019, 20 (1)
[8]   Mealtime, Temporal, and Daily Variability of the Human Urinary and Plasma Metabolomes in a Tightly Controlled Environment [J].
Kim, Kyoungmi ;
Mall, Christine ;
Taylor, Sandra L. ;
Hitchcock, Stacie ;
Zhang, Chen ;
Wettersten, Hiromi I. ;
Jones, A. Daniel ;
Chapman, Arlene ;
Weiss, Robert H. .
PLOS ONE, 2014, 9 (01)
[9]   Urine Metabolomic Analysis Identifies Potential Biomarkers and Pathogenic Pathways in Kidney Cancer [J].
Kim, Kyoungmi ;
Taylor, Sandra L. ;
Ganti, Sheila ;
Guo, Lining ;
Osier, Michael V. ;
Weiss, Robert H. .
OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY, 2011, 15 (05) :293-303
[10]   Random forest-based imputation outperforms other methods for imputing LC-MS metabolomics data: a comparative study [J].
Kokla, Marietta ;
Virtanen, Jyrki ;
Kolehmainen, Marjukka ;
Paananen, Jussi ;
Hanhineva, Kati .
BMC BIOINFORMATICS, 2019, 20 (01)