Sparse PLS-Based Method for Overlapping Metabolite Set Enrichment Analysis

被引:13
作者
Deng, Lingli [1 ,2 ]
Ma, Lei [2 ]
Cheng, Kian-Kai [3 ]
Xu, Xiangnan [4 ]
Raftery, Daniel [5 ]
Dong, Jiyang [6 ]
机构
[1] East China Univ Technol, Jiangxi Engn Technol Res Ctr Nucl Geosci Data Sci, Nanchang 330013, Jiangxi, Peoples R China
[2] East China Univ Technol, Dept Informat Engn, Nanchang 330013, Jiangxi, Peoples R China
[3] Univ Teknol Malaysia, Innovat Ctr Agritechnol, Muar 84600, Kagawa, Malaysia
[4] Univ Sydney, Sch Math & Stat, Camperdown, NSW 2006, Australia
[5] Univ Washington, Dept Anesthesiol & Pain Med, Northwest Metab Res Ctr, Seattle, WA 98109 USA
[6] Xiamen Univ, Natl Inst Data Sci Hlth & Med, Dept Elect Sci, Xiamen 361005, Peoples R China
基金
中国国家自然科学基金;
关键词
metabolite set enrichment analysis (MSEA); overlapping group partial least squares (ogPLS); debiasing regularization; group lasso; stable selection; MODEL SELECTION; REGRESSION;
D O I
10.1021/acs.jproteome.1c00064
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Metabolite set enrichment analysis (MSEA) has gained increasing research interest for identification of perturbed metabolic pathways in metabolomics. The method incorporates predefined metabolic pathways information in the analysis where metabolite sets are typically assumed to be mutually exclusive to each other. However, metabolic pathways are known to contain common metabolites and intermediates. This situation, along with limitations in metabolite detection or coverage leads to overlapping, incomplete metabolite sets in pathway analysis. For overlapping metabolite sets, MSEA tends to result in high false positives due to improper weights allocated to the overlapping metabolites. Here, we proposed an extended partial least squares (PLS) model with a new sparse scheme for overlapping metabolite set enrichment analysis, named overlapping group PLS (ogPLS) analysis. The weight vector of the ogPLS model was decomposed into pathway-specific subvectors, and then a group lasso penalty was imposed on these subvectors to achieve a proper weight allocation for the overlapping metabolites. Two strategies were adopted in the proposed ogPLS model to identify the perturbed metabolic pathways. The first strategy involves debiasing regularization, which was used to reduce inequalities amongst the predefined metabolic pathways. The second strategy is stable selection, which was used to rank pathways while avoiding the nuisance problems of model parameter optimization. Both simulated and real-world metabolomic datasets were used to evaluate the proposed method and compare with two other MSEA methods including Global-test and the multiblock PLS (MB-PLS)-based pathway importance in projection (PIP) methods. Using a simulated dataset with known perturbed pathways, the average true discovery rate for the ogPLS method was found to be higher than the Global-test and the MB-PLS-based PIP methods. Analysis with a real-world metabolomics dataset also indicated that the developed method was less prone to select pathways with highly overlapped detected metabolite sets. Compared with the two other methods, the proposed method features higher accuracy, lower false-positive rate, and is more robust when applied to overlapping metabolite set analysis. The developed ogPLS method may serve as an alternative MSEA method to facilitate biological interpretation of metabolomics data for overlapping metabolite sets.
引用
收藏
页码:3204 / 3213
页数:10
相关论文
共 25 条
[1]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[2]   MetaboAnalyst 4.0: towards more transparent and integrative metabolomics analysis [J].
Chong, Jasmine ;
Soufan, Othman ;
Li, Carin ;
Caraus, Iurie ;
Li, Shuzhao ;
Bourque, Guillaume ;
Wishart, David S. ;
Xia, Jianguo .
NUCLEIC ACIDS RESEARCH, 2018, 46 (W1) :W486-W494
[3]  
Daviss B, 2005, SCIENTIST, V19, P25
[4]   Network-based strategies in metabolomics data analysis and interpretation: from molecular networking to biological interpretation [J].
De Souza, Leonardo Perez ;
Alseekh, Saleh ;
Brotman, Yariv ;
Fernie, Alisdair R. .
EXPERT REVIEW OF PROTEOMICS, 2020, 17 (04) :243-255
[5]   Identifying Significant Metabolic Pathways Using Multi-Block Partial Least-Squares Analysis [J].
Deng, Lingli ;
Guo, Fanjing ;
Cheng, Kian-Kai ;
Zhu, Jiangjiang ;
Gu, Haiwei ;
Raftery, Daniel ;
Dong, Jiyang .
JOURNAL OF PROTEOME RESEARCH, 2020, 19 (05) :1965-1974
[7]   Global test for metabolic pathway differences between conditions [J].
Hendrickx, Diana M. ;
Hoefsloot, Huub C. J. ;
Hendriks, Margriet M. W. B. ;
Canelas, Andre B. ;
Smilde, Age K. .
ANALYTICA CHIMICA ACTA, 2012, 719 :8-15
[8]   Metabolomics: Current technologies and future trends [J].
Hollywood, Katherine ;
Brison, Daniel R. ;
Goodacre, Royston .
PROTEOMICS, 2006, 6 (17) :4716-4723
[9]   Novel personalized pathway-based metabolomics models reveal key metabolic pathways for breast cancer diagnosis [J].
Huang, Sijia ;
Chong, Nicole ;
Lewis, Nathan E. ;
Jia, Wei ;
Xie, Guoxiang ;
Garmire, Lana X. .
GENOME MEDICINE, 2016, 8
[10]   Metabolomics: beyond biomarkers and towards mechanisms [J].
Johnson, Caroline H. ;
Ivanisevic, Julijana ;
Siuzdak, Gary .
NATURE REVIEWS MOLECULAR CELL BIOLOGY, 2016, 17 (07) :451-459