Non-targeted UHPLC-MS metabolomic data processing methods: a comparative investigation of normalisation, missing value imputation, transformation and scaling

被引:232
作者
Di Guida, Riccardo [1 ,2 ]
Engel, Jasper [1 ,3 ]
Allwood, J. William [1 ]
Weber, Ralf J. M. [1 ]
Jones, Martin R. [1 ]
Sommer, Ulf [1 ,3 ]
Viant, Mark R. [1 ,3 ,4 ,5 ]
Dunn, Warwick B. [1 ,2 ,4 ,5 ]
机构
[1] Univ Birmingham, Sch Biosci, Birmingham B15 2TT, W Midlands, England
[2] Univ Birmingham, MRC, ARUK Ctr Musculoskeletal Ageing Res, Birmingham B15 2TT, W Midlands, England
[3] Univ Birmingham, NERC Biomol Anal Facil Metabol Node NBAF B, Birmingham B15 2TT, W Midlands, England
[4] Univ Birmingham, Phenome Ctr Birmingham, Birmingham B15 2TT, W Midlands, England
[5] Univ Birmingham, Inst Metab & Syst Res, Birmingham B15 2TT, W Midlands, England
基金
英国惠康基金; 英国自然环境研究理事会;
关键词
UHPLC-MS; Metabolomics; Random forest; KNN; PQN normalisation; Glog transformation; MASS-SPECTROMETRY DATA; LIQUID-CHROMATOGRAPHY; LC-MS; SERUM; STABILITY; INFERENCE; STRATEGY; WORKFLOW; H-1-NMR; MZMINE;
D O I
10.1007/s11306-016-1030-9
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Introduction The generic metabolomics data processing workflow is constructed with a serial set of processes including peak picking, quality assurance, normalisation, missing value imputation, transformation and scaling. The combination of these processes should present the experimental data in an appropriate structure so to identify the biological changes in a valid and robust manner. Objectives Currently, different researchers apply different data processing methods and no assessment of the permutations applied to UHPLC-MS datasets has been published. Here we wish to define the most appropriate data processing workflow. Methods We assess the influence of normalisation, missing value imputation, transformation and scaling methods on univariate and multivariate analysis of UHPLC-MS datasets acquired for different mammalian samples. Results Our studies have shown that once data are filtered, missing values are not correlated with m/z, retention time or response. Following an exhaustive evaluation, we recommend PQN normalisation with no missing value imputation and no transformation or scaling for univariate analysis. For PCA we recommend applying PQN normalisation with Random Forest missing value imputation, glog transformation and no scaling method. For PLS-DA we recommend PQN normalisation, KNN as the missing value imputation method, generalised logarithm transformation and no scaling. These recommendations are based on searching for the biologically important metabolite features independent of their measured abundance. Conclusion The appropriate choice of normalisation, missing value imputation, transformation and scaling methods differs depending on the data analysis method and the choice of method is essential to maximise the biological derivations from UHPLC-MS datasets.
引用
收藏
页数:14
相关论文
共 49 条
  • [1] [Anonymous], 1991, USER GUIDE PRINCIPAL
  • [2] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING
    BENJAMINI, Y
    HOCHBERG, Y
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) : 289 - 300
  • [3] 1H NMR, GC-EI-TOFMS, and Data Set Correlation for Fruit Metabolomics: Application to Spatial Metabolite Analysis in Melon
    Biais, Benoit
    Allwood, J. William
    Deborde, Catherine
    Xu, Yun
    Maucourt, Mickael
    Beauvoit, Bertrand
    Dunn, Warwick B.
    Jacob, Daniel
    Goodacre, Royston
    Rolin, Dominique
    Moing, Annick
    [J]. ANALYTICAL CHEMISTRY, 2009, 81 (08) : 2884 - 2894
  • [4] Large-scale human metabolomics studies: A strategy for data (pre-) processing and validation
    Bijlsma, S
    Bobeldijk, L
    Verheij, ER
    Ramaker, R
    Kochhar, S
    Macdonald, IA
    van Ommen, B
    Smilde, AK
    [J]. ANALYTICAL CHEMISTRY, 2006, 78 (02) : 567 - 574
  • [5] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [6] Comparative LC-MS-based metabolite profiling of the ancient tropical rainforest tree Symphonia globulifera
    Cottet, Kevin
    Genta-Jouve, Gregory
    Fromentin, Yann
    Odonne, Guillaume
    Duplais, Christophe
    Laprevote, Olivier
    Michel, Sylvie
    Lallemand, Marie-Christine
    [J]. PHYTOCHEMISTRY, 2014, 108 : 102 - 108
  • [7] Galaxy-M: a Galaxy workflow for processing and analyzing direct infusion and liquid chromatography mass spectrometry-based metabolomics data
    Davidson, Robert L.
    Weber, Ralf J. M.
    Liu, Haoyu
    Sharma-Oates, Archana
    Viant, Mark R.
    [J]. GIGASCIENCE, 2016, 5
  • [8] Effect of sleep deprivation on the human metabolome
    Davies, Sarah K.
    Ang, Joo Ern
    Revell, Victoria L.
    Holmes, Ben
    Mann, Anuska
    Robertson, Francesca P.
    Cui, Nanyi
    Middleton, Benita
    Ackermann, Katrin
    Kayser, Manfred
    Thumser, Alfred E.
    Raynaud, Florence I.
    Skene, Debra J.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2014, 111 (29) : 10761 - 10766
  • [9] Normalizing and Integrating Metabolomics Data
    De Livera, Alysha M.
    Dias, Daniel A.
    De Souza, David
    Rupasinghe, Thusitha
    Pyke, James
    Tull, Dedreia
    Roessner, Ute
    McConville, Malcolm
    Speed, Terence P.
    [J]. ANALYTICAL CHEMISTRY, 2012, 84 (24) : 10768 - 10776
  • [10] Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures.: Application in 1H NMR metabonomics
    Dieterle, Frank
    Ross, Alfred
    Schlotterbeck, Gotz
    Senn, Hans
    [J]. ANALYTICAL CHEMISTRY, 2006, 78 (13) : 4281 - 4290