Mixture model normalization for non-targeted gas chromatography/mass spectrometry metabolomics data

被引:29
作者
Reisetter, Anna C. [1 ]
Muehlbauer, Michael J. [2 ,3 ]
Bain, James R. [2 ,3 ]
Nodzenski, Michael [1 ]
Stevens, Robert D. [2 ,3 ]
Ilkayeva, Olga [2 ,3 ]
Metzger, Boyd E. [4 ]
Newgard, Christopher B. [2 ,3 ]
Lowe, William L., Jr. [4 ]
Scholtens, Denise M. [1 ]
机构
[1] Northwestern Univ, Feinberg Sch Med, Div Biostat, Dept Prevent Med, Chicago, IL 60611 USA
[2] Duke Univ, Med Ctr, Sarah W Stedman Nutr & Metab Ctr, Durham, NC 27701 USA
[3] Duke Univ, Sch Med, Durham, NC 27701 USA
[4] Northwestern Univ, Feinberg Sch Med, Div Endocrinol, Dept Med, Chicago, IL 60611 USA
关键词
Metabolomics; Non-targeted; Gas chromatography/mass spectrometry; GC/MS; Normalization; Batch effects; MASS-SPECTROMETRY; LARGE-SCALE; WEIGHT-LOSS; PREGNANCY; SAMPLES; PLASMA; SERUM; BIOINFORMATICS; BIOCONDUCTOR; METABOLITES;
D O I
10.1186/s12859-017-1501-7
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Metabolomics offers a unique integrative perspective for health research, reflecting genetic and environmental contributions to disease-related phenotypes. Identifying robust associations in population-based or large-scale clinical studies demands large numbers of subjects and therefore sample batching for gas-chromatography/ mass spectrometry (GC/MS) non-targeted assays. When run over weeks or months, technical noise due to batch and run-order threatens data interpretability. Application of existing normalization methods to metabolomics is challenged by unsatisfied modeling assumptions and, notably, failure to address batch-specific truncation of low abundance compounds. Results: To curtail technical noise and make GC/MS metabolomics data amenable to analyses describing biologically relevant variability, we propose mixture model normalization (mixnorm) that accommodates truncated data and estimates per-metabolite batch and run-order effects using quality control samples. Mixnorm outperforms other approaches across many metrics, including improved correlation of non-targeted and targeted measurements and superior performance when metabolite detectability varies according to batch. For some metrics, particularly when truncation is less frequent for a metabolite, mean centering and median scaling demonstrate comparable performance to mixnorm. Conclusions: When quality control samples are systematically included in batches, mixnorm is uniquely suited to normalizing non-targeted GC/MS metabolomics data due to explicit accommodation of batch effects, run order and varying thresholds of detectability. Especially in large-scale studies, normalization is crucial for drawing accurate conclusions from non-targeted GC/MS metabolomics data.
引用
收藏
页数:17
相关论文
共 42 条
[1]   Metabolomic analysis reveals amino-acid responses to an oral glucose tolerance test in women with prior history of gestational diabetes mellitus [J].
Bentley-Lewis, R. ;
Xiong, G. ;
Lee, H. ;
Yang, A. ;
Huynh, J. ;
Kim, C. .
JOURNAL OF CLINICAL AND TRANSLATIONAL ENDOCRINOLOGY, 2014, 1 (02) :38-43
[2]  
Bolstad B., 2019, PREPROCESSCORE COLLE
[3]   A comparison of normalization methods for high density oligonucleotide array data based on variance and bias [J].
Bolstad, BM ;
Irizarry, RA ;
Åstrand, M ;
Speed, TP .
BIOINFORMATICS, 2003, 19 (02) :185-193
[4]  
Contreras M, 2002, INT J GYNECOL OBSTET, V78, P69
[5]   Normalizing and Integrating Metabolomics Data [J].
De Livera, Alysha M. ;
Dias, Daniel A. ;
De Souza, David ;
Rupasinghe, Thusitha ;
Pyke, James ;
Tull, Dedreia ;
Roessner, Ute ;
McConville, Malcolm ;
Speed, Terence P. .
ANALYTICAL CHEMISTRY, 2012, 84 (24) :10768-10776
[6]  
Dudoit S, 2002, STAT SINICA, V12, P111
[7]  
Dunn WB, 2012, BIOANALYSIS, V4, P2249, DOI [10.4155/BIO.12.204, 10.4155/bio.12.204]
[8]   Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry [J].
Dunn, Warwick B. ;
Broadhurst, David ;
Begley, Paul ;
Zelena, Eva ;
Francis-McIntyre, Sue ;
Anderson, Nadine ;
Brown, Marie ;
Knowles, Joshau D. ;
Halsall, Antony ;
Haselden, John N. ;
Nicholls, Andrew W. ;
Wilson, Ian D. ;
Kell, Douglas B. ;
Goodacre, Royston .
NATURE PROTOCOLS, 2011, 6 (07) :1060-1083
[9]   Bioconductor: open software development for computational biology and bioinformatics [J].
Gentleman, RC ;
Carey, VJ ;
Bates, DM ;
Bolstad, B ;
Dettling, M ;
Dudoit, S ;
Ellis, B ;
Gautier, L ;
Ge, YC ;
Gentry, J ;
Hornik, K ;
Hothorn, T ;
Huber, W ;
Iacus, S ;
Irizarry, R ;
Leisch, F ;
Li, C ;
Maechler, M ;
Rossini, AJ ;
Sawitzki, G ;
Smith, C ;
Smyth, G ;
Tierney, L ;
Yang, JYH ;
Zhang, JH .
GENOME BIOLOGY, 2004, 5 (10)
[10]   Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics [J].
Giacomoni, Franck ;
Le Corguille, Gildas ;
Monsoor, Misharl ;
Landi, Marion ;
Pericard, Pierre ;
Petera, Melanie ;
Duperier, Christophe ;
Tremblay-Franco, Marie ;
Martin, Jean-Francois ;
Jacob, Daniel ;
Goulitquer, Sophie ;
Thevenot, Etienne A. ;
Caron, Christophe .
BIOINFORMATICS, 2015, 31 (09) :1493-1495