WaveICA: A novel algorithm to remove batch effects for large-scale untargeted metabolomics data based on wavelet analysis

被引:42
作者
Deng, Kui [1 ]
Zhang, Fan [3 ]
Tan, Qilong [1 ]
Huang, Yue [1 ]
Song, Wei [1 ]
Rong, Zhiwei [1 ]
Zhu, Zheng-Jiang [2 ]
Li, Zhenzi [1 ]
Li, Kang [1 ]
机构
[1] Harbin Med Univ, Sch Publ Hlth, Dept Epidemiol & Biostat, Harbin 150086, Heilongjiang, Peoples R China
[2] Chinese Acad Sci, Shanghai Inst Organ Chem, Interdisciplinary Res Ctr Biol & Chem, Shanghai 200032, Peoples R China
[3] Harbin Med Univ, Affiliated Hosp 1, Lab Hematol Ctr, Harbin 150086, Heilongjiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Metabolomics; Wavelet transform; Independent component analysis; Batch effect; Data normalization; MASS-SPECTROMETRY; NORMALIZATION METHODS; TRANSFORM; STRATEGY; SAMPLES; CHROMATOGRAPHY; EXTRACTION;
D O I
10.1016/j.aca.2019.02.010
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Metabolomics provides new insights into disease pathogenesis and biomarker discovery. Samples from large-scale untargeted metabolomics studies are typically analyzed using a liquid chromatography-mass spectrometry platform in several batches. Batch effects that are caused by non-biological systematic biases are unavoidable in large-scale metabolomics studies, even with properly designed experiments. The statistical analysis of large-scale metabolomics data without managing batch effects will yield misleading results. In this study, we propose a novel algorithm, called WaveICA, which is based on the wavelet transform method with independent component analysis, as the threshold processing method to capture and remove batch effects for large-scale metabolomics data. The WaveICA method uses the time trend of samples over the injection order, decomposes the original data into multi-scale data with different features, extracts and removes the batch effect information in multi-scale data, and obtains clean data. The WaveICA method was tested on real metabolomics data. After applying the WaveICA method, scattered quality control samples (QCS) and subject samples in a PCA score plot of the original data were closely clustered, respectively. The average Pearson correlation coefficients for all peaks of the QCS increased from 0.872 to 0.972. Additionally, WaveICA significantly improved the classification accuracy for metabolomics data. The method was compared with three representative methods, and outperformed all of them. To conclude, WaveICA can efficiently remove batch effects while revealing more biological information. This method can be used in large-scale untargeted metabolomics studies to preprocess raw metabolomics data. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:60 / 69
页数:10
相关论文
共 47 条
[1]   Analytical methods in untargeted metabolomics: state of the art in 2015 [J].
Alonso, Arnald ;
Marsal, Sara ;
Julia, Antonio .
FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY, 2015, 3
[2]  
[Anonymous], 2002, P 1 INT C IM PROC
[3]   Monitoring cancer prognosis, diagnosis and treatment efficacy using metabolomics and lipidomics [J].
Armitage, Emily G. ;
Southam, Andrew D. .
METABOLOMICS, 2016, 12 (09)
[4]   Large-scale human metabolomics studies: A strategy for data (pre-) processing and validation [J].
Bijlsma, S ;
Bobeldijk, L ;
Verheij, ER ;
Ramaker, R ;
Kochhar, S ;
Macdonald, IA ;
van Ommen, B ;
Smilde, AK .
ANALYTICAL CHEMISTRY, 2006, 78 (02) :567-574
[5]   A comparison of normalization methods for high density oligonucleotide array data based on variance and bias [J].
Bolstad, BM ;
Irizarry, RA ;
Åstrand, M ;
Speed, TP .
BIOINFORMATICS, 2003, 19 (02) :185-193
[6]  
Buendia F, 2008, WMSCI 2008: 12TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL VIII, PROCEEDINGS, P69
[7]   Normalization approaches for removing systematic biases associated with mass spectrometry and label-free proteomics [J].
Callister, SJ ;
Barry, RC ;
Adkins, JN ;
Johnson, ET ;
Qian, WJ ;
Webb-Robertson, BJM ;
Smith, RD ;
Lipton, MS .
JOURNAL OF PROTEOME RESEARCH, 2006, 5 (02) :277-286
[8]   THE WAVELET TRANSFORM, TIME-FREQUENCY LOCALIZATION AND SIGNAL ANALYSIS [J].
DAUBECHIES, I .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1990, 36 (05) :961-1005
[9]   Statistical Methods for Handling Unwanted Variation in Metabolomics Data [J].
De Livera, Alysha M. ;
Sysi-Aho, Marko ;
Jacob, Laurent ;
Gagnon-Bartsch, Johann A. ;
Castillo, Sandra ;
Simpson, Julie A. ;
Speed, Terence P. .
ANALYTICAL CHEMISTRY, 2015, 87 (07) :3606-3615
[10]   Normalizing and Integrating Metabolomics Data [J].
De Livera, Alysha M. ;
Dias, Daniel A. ;
De Souza, David ;
Rupasinghe, Thusitha ;
Pyke, James ;
Tull, Dedreia ;
Roessner, Ute ;
McConville, Malcolm ;
Speed, Terence P. .
ANALYTICAL CHEMISTRY, 2012, 84 (24) :10768-10776