Unique Ion Filter: A Data Reduction Tool for GC/MS Data Preprocessing Prior to Chemometric Analysis

被引:17
作者
Adutwum, L. A. [1 ]
Harynuk, J. J. [1 ]
机构
[1] Univ Alberta, Dept Chem, Edmonton, AB T6G 2G2, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
GAS CHROMATOGRAPHY/MASS SPECTROMETRY; MASS-SPECTROMETRY; FEATURE-SELECTION; PATTERN-RECOGNITION; OIL-SPILL; GASOLINE; CLASSIFICATION; SAMPLES; GC; DISCRIMINATION;
D O I
10.1021/ac501660a
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Using raw GC/MS data as the X-block for chemometric modeling has the potential to provide better classification models for complex samples when compared to using the total ion current (TIC), extracted ion chromatograms/profiles (EIC/EIP), or integrated peak tables. However, the abundance of raw GC/MS data necessitates some form of data reduction/feature selection to remove the variables containing primarily noise from the data set. Several algorithms for feature selection exist; however, due to the extreme number of variables (10(6)-10(8) variables per chromatogram), the feature selection time can be prolonged and computationally expensive. Herein, we present a new prefilter for automated data reduction of GC/MS data prior to feature selection. This tool, termed unique ion filter (UIF), is a module that can be added after chromatographic alignment and prior to any subsequent feature selection algorithm. The UIF objectively reduces the number of irrelevant or redundant variables in raw GC/MS data, while preserving potentially relevant analytical information. In the m/z dimension, data are reduced from a full spectrum to a handful of unique ions for each chromatographic peak. In the time dimension, data are reduced to only a handful of scans around each peak apex. UIF was applied to a data set of GC/MS data for a variety of gasoline samples to be classified using partial least-squares discriminant analysis (PLS-DA) according to octane rating. It was also applied to a series of chromatograms from casework fire debris analysis to be classified on the basis of whether or not signatures of gasoline were detected. By reducing the overall population of candidate variables subjected to subsequent variable selection, the UIF reduced the total feature selection time for which a perfect classification of all validation data was achieved from 373 to 9 min (98% reduction in computing time). Additionally, the significant reduction in included variables resulted in a concomitant reduction in noise, improving overall model quality. A minimum of two um/z and scan window of three about the peak apex could provide enough information about each peak for the successful PLS-DA modeling of the data as 100% model prediction accuracy was achieved. It is also shown that the application of UIF does not alter the underlying chemical information in the data.
引用
收藏
页码:7726 / 7733
页数:8
相关论文
共 44 条
  • [1] Classification of GC-MS measurements of wines by combining data dimension reduction and variable selection techniques
    Ballabio, Davide
    Skov, Thomas
    Leardi, Riccardo
    Bro, Rasmus
    [J]. JOURNAL OF CHEMOMETRICS, 2008, 22 (7-8) : 457 - 463
  • [2] Application of comprehensive two-dimensional gas chromatography with time-of-flight mass spectrometry method to identify potential biomarkers of perinatal asphyxia in a non-human primate model
    Beckstrom, Andrew C.
    Humston, Elizabeth M.
    Snyder, Laura R.
    Synovec, Robert E.
    Juul, Sandra E.
    [J]. JOURNAL OF CHROMATOGRAPHY A, 2011, 1218 (14) : 1899 - 1906
  • [3] Centering and scaling in component analysis
    Bro, R
    Smilde, AK
    [J]. JOURNAL OF CHEMOMETRICS, 2003, 17 (01) : 16 - 33
  • [4] Feature subset selection Filter-Wrapper based on low quality data
    Cadenas, Jose M.
    Carmen Garrido, M.
    Martinez, Raquel
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (16) : 6241 - 6252
  • [5] Practical aspects of chemometrics for oil spill fingerprinting
    Christensen, Jan H.
    Tomasi, Giorgio
    [J]. JOURNAL OF CHROMATOGRAPHY A, 2007, 1169 (1-2) : 1 - 22
  • [6] Investigation of Gender-Specific Exhaled Breath Volatome in Humans by GCxGC-TOF-MS
    Das, Mrinal Kumar
    Bishwal, Subasa Chandra
    Das, Aleena
    Dabral, Deepti
    Varshney, Ankur
    Badireddy, Vinod Kumar
    Nanda, Ranjan
    [J]. ANALYTICAL CHEMISTRY, 2014, 86 (02) : 1229 - 1237
  • [7] Classification. of premium and regular gasoline by gas chromatography/mass spectrometry, principal component analysis and artificial neural networks
    Doble, P
    Sandercock, M
    Du Pasquier, E
    Petocz, P
    Roux, C
    Dawson, M
    [J]. FORENSIC SCIENCE INTERNATIONAL, 2003, 132 (01) : 26 - 39
  • [8] Advances in metaheuristics for gene selection and classification of microarray data
    Duval, Beatrice
    Hao, Jin-Kao
    [J]. BRIEFINGS IN BIOINFORMATICS, 2010, 11 (01) : 127 - 141
  • [9] An unsupervised approach to feature discretization and selection
    Ferreira, Artur J.
    Figueiredo, Mario A. T.
    [J]. PATTERN RECOGNITION, 2012, 45 (09) : 3048 - 3060
  • [10] Evaluation of volatile metabolites as markers in Lycopersicon esculentum L. cultivars discrimination by multivariate analysis of headspace solid phase microextraction and mass spectrometry data
    Figueira, Jose
    Camara, Hugo
    Pereira, Jorge
    Camara, Jose S.
    [J]. FOOD CHEMISTRY, 2014, 145 : 653 - 663