Utilizing machine learning with knockoff filtering to extract significant metabolites in Crohn's disease with a publicly available untargeted metabolomics dataset

被引:3
作者
Bin Masud, Shoaib [1 ]
Jenkins, Conor [3 ]
Hussey, Erika [2 ]
Elkin-Frankston, Seth [2 ]
Mach, Phillip [3 ]
Dhummakupt, Elizabeth [3 ]
Aeron, Shuchin [1 ]
机构
[1] Tufts Univ, Dept Elect & Comp Engn, Medford, MA 02155 USA
[2] DEVCOM Soldier Ctr, Natick, MA USA
[3] DEVCOM Chem Biol Ctr, Aberdeen, MD 21010 USA
关键词
INFLAMMATORY-BOWEL-DISEASE; POST-SELECTION INFERENCE; FALSE DISCOVERY RATE; MULTIVARIATE-ANALYSIS; MASS-SPECTROMETRY; STRATEGIES; BIOMARKERS; VALUES;
D O I
10.1371/journal.pone.0255240
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Metabolomic data processing pipelines have been improving in recent years, allowing for greater feature extraction and identification. Lately, machine learning and robust statistical techniques to control false discoveries are being incorporated into metabolomic data analysis. In this paper, we introduce one such recently developed technique called aggregate knockoff filtering to untargeted metabolomic analysis. When applied to a publicly available dataset, aggregate knockoff filtering combined with typical p-value filtering improves the number of significantly changing metabolites by 25% when compared to conventional untargeted metabolomic data processing. By using this method, features that would normally not be extracted under standard processing would be brought to researchers' attention for further analysis.
引用
收藏
页数:13
相关论文
共 51 条
[1]   A guide through the computational analysis of isotope-labeled mass spectrometry-based quantitative proteomics data: an application study [J].
Albaum, Stefan P. ;
Hahne, Hannes ;
Otto, Andreas ;
Haussmann, Ute ;
Becher, Doerte ;
Poetsch, Ansgar ;
Goesmann, Alexander ;
Nattkemper, Tim W. .
PROTEOME SCIENCE, 2011, 9
[2]   Statistical Workflow for Feature Selection in Human Metabolomics Data [J].
Antonelli, Joseph ;
Claggett, Brian L. ;
Henglin, Mir ;
Kim, Andy ;
Ovsak, Gavin ;
Kim, Nicole ;
Deng, Katherine ;
Rao, Kevin ;
Tyagi, Octavia ;
Watrous, Jeramie D. ;
Lagerborg, Kim A. ;
Hushcha, Pavel V. ;
Demler, Olga V. ;
Mora, Samia ;
Niiranen, Teemu J. ;
Pereira, Alexandre C. ;
Jain, Mohit ;
Cheng, Susan .
METABOLITES, 2019, 9 (07)
[3]   CONTROLLING THE FALSE DISCOVERY RATE VIA KNOCKOFFS [J].
Barber, Rina Foygel ;
Candes, Emmanuel J. .
ANNALS OF STATISTICS, 2015, 43 (05) :2055-2085
[4]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[5]   Biomarkers in inflammatory bowel diseases: Current status and proteomics identification strategies [J].
Bennike, Tue ;
Birkelund, Svend ;
Stensballe, Allan ;
Andersen, Vibeke .
WORLD JOURNAL OF GASTROENTEROLOGY, 2014, 20 (12) :3231-3244
[6]   VALID POST-SELECTION INFERENCE [J].
Berk, Richard ;
Brown, Lawrence ;
Buja, Andreas ;
Zhang, Kai ;
Zhao, Linda .
ANNALS OF STATISTICS, 2013, 41 (02) :802-837
[7]   Large-scale human metabolomics studies: A strategy for data (pre-) processing and validation [J].
Bijlsma, S ;
Bobeldijk, L ;
Verheij, ER ;
Ramaker, R ;
Kochhar, S ;
Macdonald, IA ;
van Ommen, B ;
Smilde, AK .
ANALYTICAL CHEMISTRY, 2006, 78 (02) :567-574
[8]   Statistical strategies for avoiding false discoveries in metabolomics and related experiments [J].
Broadhurst, David I. ;
Kell, Douglas B. .
METABOLOMICS, 2006, 2 (04) :171-196
[9]   Metabolomics and Receiver Operating Characteristic Analysis: A Promising Approach for Sepsis Diagnosis [J].
Bunger, Rolf ;
Mallet, Robert T. .
CRITICAL CARE MEDICINE, 2016, 44 (09) :1784-1785
[10]  
Candes E, 2016, ARXIV PREPRINT ARXIV