Development of variance rank initiated-unsupervised sample indexing for gas chromatography-mass spectrometry analysis

被引:6
作者
Cain, Caitlin N. [1 ]
Sudol, Paige E. [1 ]
Berrier, Kelsey L. [1 ]
Synovec, Robert E. [1 ]
机构
[1] Univ Washington, Dept Chem, Box 351700, Seattle, WA 98195 USA
基金
美国国家科学基金会;
关键词
Unsupervised Variance rank initiated-unsupervised sample indexing; Gas chromatography-mass spectrometry; Chemometrics; Exploratory data analysis; FISHER RATIO ANALYSIS; GC-TOFMS DATA; PIECEWISE ALIGNMENT; PATTERN-RECOGNITION; FEATURE-SELECTION; CLASSIFICATION; METABOLOMICS; METABOLITES;
D O I
10.1016/j.talanta.2021.122495
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Traditional non-targeted chemometric workflows for gas chromatography-mass spectrometry (GC-MS) data rely on using supervised methods, which requires a priori knowledge of sample class membership. Herein, we propose a simple, unsupervised chemometric workflow known as variance rank initiated-unsupervised sample indexing (VRI-USI). VRI-USI discovers analyte peaks exhibiting high relative variance across all samples, followed by kmeans clustering on the individual peaks. Based upon how the samples cluster for a given peak, a sample index assignment is provided. Using a probabilistic argument, if the same sample index assignment appears for several discovered peaks, then this outcome strongly suggests that the samples are properly classified by that particular sample index assignment. Thus, relevant chemical differences between the samples have been discovered in an unsupervised fashion. The VRI-USI workflow is demonstrated on three, increasingly difficult datasets: simulations, yeast metabolomics, and human cancer metabolomics. For simulated GC-MS datasets, VRI-USI discovered 85-90% of analytes modeled to vary between sample classes. Nineteen out of 53 peaks in the peak table developed for the yeast metabolome dataset had the same sample index assignments, indicating that those indices are most likely due to class-distinguishing chemical differences. A t-test revealed that 22 out of 53 peaks were statistically significant (p < 0.05) when using those sample index assignments. Likewise, for the human cancer metabolomics study, VRI-USI discovered 25 analytes that were statistically different (p < 0.05) using the sample index assignments determined to highlight meaningful sample-based differences. For all datasets, the sample index assignments that were deduced from VRI-USI were the correct class-based difference when using prior knowledge. VRI-USI holds promise as an exploratory data analysis workflow for studies in which analysts do not readily have a priori class information or want to uncover the underlying nature of their dataset.
引用
收藏
页数:12
相关论文
共 51 条
[1]   Unique Ion Filter: A Data Reduction Tool for GC/MS Data Preprocessing Prior to Chemometric Analysis [J].
Adutwum, L. A. ;
Harynuk, J. J. .
ANALYTICAL CHEMISTRY, 2014, 86 (15) :7726-7733
[2]   Pattern recognition analysis of chromatographic fingerprints of Crocus sativus L. secondary metabolites towards source identification and quality control [J].
Aliakbarzadeh, Ghazaleh ;
Sereshti, Hassan ;
Parastar, Hadi .
ANALYTICAL AND BIOANALYTICAL CHEMISTRY, 2016, 408 (12) :3295-3307
[3]  
[Anonymous], 2015, INTRO PROBABILITY ST
[4]   Sources of uncertainty in gas chromatography and high-performance liquid chromatography [J].
Barwick, VJ .
JOURNAL OF CHROMATOGRAPHY A, 1999, 849 (01) :13-33
[5]   Sample preparation with solid phase microextraction and exhaustive extraction approaches: Comparison for challenging cases [J].
Boyaci, Ezel ;
Rodriguez-Lafuente, Angel ;
Gorynski, Krzysztof ;
Mirnaghi, Fatemeh ;
Souza-Silva, Erica A. ;
Hein, Dietmar ;
Pawliszyn, Janusz .
ANALYTICA CHIMICA ACTA, 2015, 873 :14-30
[6]  
Brownlee J., 2020, Machine Learning Mastery
[7]  
Cai D., 2010, Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P333, DOI DOI 10.1145/1835804.1835848
[8]   Analytical Determination of the Severity of Potato Taste Defect in Roasted East African Arabica Coffee [J].
Cain, Caitlin N. ;
Haughn, Noah J. ;
Purcell, Hayley J. ;
Marney, Luke C. ;
Synovec, Robert E. ;
Thoumsin, Chelsea T. ;
Jackels, Susan C. ;
Skogerboe, Kristen J. .
JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY, 2021, 69 (07) :2253-2261
[9]   Development of an Enhanced Total Ion Current Chromatogram Algorithm to Improve Untargeted Peak Detection [J].
Cain, Caitlin N. ;
Schoneich, Sonia ;
Synovec, Robert E. .
ANALYTICAL CHEMISTRY, 2020, 92 (16) :11365-11373
[10]  
Chauhan A., 2014, J ANAL BIOANAL TECH, V5, DOI [10.4172/2155-9872, DOI 10.4172/2155-9872]