VOCCluster: Untargeted Metabolomics Feature Clustering Approach for Clinical Breath Gas Chromatography/Mass Spectrometry Data

被引:27
作者
Alkhalifah, Yaser [1 ]
Phillips, Iain [1 ]
Soltoggio, Andrea [1 ]
Darnley, Kareen [3 ]
Nailon, William H. [3 ]
McLaren, Duncan [3 ]
Eddleston, Michael [4 ]
Thomas, C. L. Paul [2 ]
Salman, Dahlia [2 ]
机构
[1] Loughborough Univ, Dept Comp Sci, Loughborough LE11 3TU, Leics, England
[2] Loughborough Univ, Dept Chem, Loughborough LE11 3TU, Leics, England
[3] NHS Lothian, Edinburgh Canc Ctr, Edinburgh EH4 2SP, Midlothian, Scotland
[4] Univ Edinburgh, Pharmacol Toxicol & Therapeut Unit, Edinburgh EH8 9YL, Midlothian, Scotland
基金
欧盟地平线“2020”;
关键词
IDENTIFICATION; METABOLITES; ALGORITHMS; SPECTRA;
D O I
10.1021/acs.analchem.9b03084
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Metabolic profiling of breath analysis involves processing, alignment, scaling, and clustering of thousands of features extracted from gas chromatography/mass spectrometry (GC/MS) data from hundreds of participants. The multistep data processing is complicated, operator error-prone, and time-consuming. Automated algorithmic clustering methods that are able to cluster features in a fast and reliable way are necessary. These accelerate metabolic profiling and discovery platforms for next-generation medical diagnostic tools. Our unsupervised clustering technique, VOCCluster, prototyped in Python, handles features of deconvolved GC/MS breath data. VOCCluster was created from a heuristic ontology based on the observation of experts undertaking data processing with a suite of software packages. VOCCluster identifies and clusters groups of volatile organic compounds (VOCs) from deconvolved GC/MS breath with similar mass spectra and retention index profiles. VOCCluster was used to cluster more than 15 000 features extracted from 74 GC/MS clinical breath samples obtained from participants with cancer before and after a radiation therapy. Results were evaluated against a panel of ground truth compounds and compared to other clustering methods (DBSCAN and OPTICS) that were used in previous metabolomics studies. VOCCluster was able to cluster those features into 1081 groups (including endogenous and exogenous compounds and instrumental artifacts) with an accuracy rate of 96% (+/- 0.04 at 95% confidence interval).
引用
收藏
页码:2937 / 2945
页数:9
相关论文
共 31 条
[1]   Breath testing as potential colorectal cancer screening tool [J].
Amal, Haitham ;
Leja, Marcis ;
Funka, Konrads ;
Lasina, Ieva ;
Skapars, Roberts ;
Sivins, Armands ;
Ancans, Guntis ;
Kikuste, Ilze ;
Vanags, Aigars ;
Tolmanis, Ivars ;
Kirsners, Arnis ;
Kupcinskas, Limas ;
Haick, Hossam .
INTERNATIONAL JOURNAL OF CANCER, 2016, 138 (01) :229-236
[2]   The human volatilome: volatile organic compounds (VOCs) in exhaled breath, skin emanations, urine, feces and saliva [J].
Amann, Anton ;
Costello, Ben de Lacy ;
Miekisch, Wolfram ;
Schubert, Jochen ;
Buszewski, Boguslaw ;
Pleil, Joachim ;
Ratcliffe, Norman ;
Risby, Terence .
JOURNAL OF BREATH RESEARCH, 2014, 8 (03)
[3]  
Amann A, 2013, VOLATILE BIOMARKERS: NON-INVASIVE DIAGNOSIS IN PHYSIOLOGY AND MEDICINE, pXXVII
[4]  
Ayodele T.O., 2010, New advances in machine learning, V3, P19, DOI [DOI 10.5772/9385, 10.5772/9385]
[5]   A metabolome pipeline: from concept to data to knowledge [J].
Brown, Marie ;
Dunn, Warwick B. ;
Ellis, David I. ;
Goodacre, Royston ;
Handl, Julia ;
Knowles, Joshua D. ;
O'Hagan, Steve ;
Spasic, Irena ;
Kell, Douglas B. .
METABOLOMICS, 2005, 1 (01) :39-51
[6]   Progressive peak clustering in GC-MS metabolomic experiments applied to Leishmania parasites [J].
De Souza, David P. ;
Saunders, Eleanor C. ;
McConville, Malcolm J. ;
Likic, Vladimir A. .
BIOINFORMATICS, 2006, 22 (11) :1391-1396
[7]   Clustering of MS2 spectra using unsupervised methods to aid the identification of secondary metabolites from Pseudomonas aeruginosa [J].
Depke, Tobias ;
Franke, Raimo ;
Broenstrup, Mark .
JOURNAL OF CHROMATOGRAPHY B-ANALYTICAL TECHNOLOGIES IN THE BIOMEDICAL AND LIFE SCIENCES, 2017, 1071 :19-28
[8]  
Ester M., 1996, KDD-96 Proceedings. Second International Conference on Knowledge Discovery and Data Mining, P226
[9]   Identification of uncommon plant metabolites based on calculation of elemental compositions using gas chromatography and quadrupole mass spectrometry [J].
Fiehn, O ;
Kopka, J ;
Trethewey, RN ;
Willmitzer, L .
ANALYTICAL CHEMISTRY, 2000, 72 (15) :3573-3580
[10]  
Guallar-Hoyas C, 2012, BIOANALYSIS, V4, P2227, DOI [10.4155/BIO.12.193, 10.4155/bio.12.193]