Comparison of dimension reduction techniques in the analysis of mass spectrometry data

被引:14
作者
Isokaanta, Sini [1 ]
Kari, Eetu [1 ,3 ]
Buchholz, Angela [1 ]
Hao, Liqing [1 ]
Schobesberger, Siegfried [1 ]
Virtanen, Annele [1 ]
Mikkonen, Santtu [1 ,2 ]
机构
[1] Univ Eastern Finland, Dept Appl Phys, Kuopio 70210, Finland
[2] Univ Eastern Finland, Dept Environm & Biol Sci, Kuopio 70210, Finland
[3] Neste Oyj, Espoo 02150, Finland
基金
欧盟地平线“2020”; 芬兰科学院;
关键词
EXPLORATORY FACTOR-ANALYSIS; SECONDARY ORGANIC AEROSOL; NONNEGATIVE MATRIX FACTORIZATION; SOURCE APPORTIONMENT; NUMBER; DECONVOLUTION; EMISSIONS; COMPLEX; FIT; VALIDATION;
D O I
10.5194/amt-13-2995-2020
中图分类号
P4 [大气科学(气象学)];
学科分类号
0706 ; 070601 ;
摘要
Online analysis with mass spectrometers produces complex data sets, consisting of mass spectra with a large number of chemical compounds (ions). Statistical dimension reduction techniques (SDRTs) are able to condense complex data sets into a more compact form while preserving the information included in the original observations. The general principle of these techniques is to investigate the underlying dependencies of the measured variables by combining variables with similar characteristics into distinct groups, called factors or components. Currently, positive matrix factorization (PMF) is the most commonly exploited SDRT across a range of atmospheric studies, in particular for source apportionment. In this study, we used five different SDRTs in analysing mass spectral data from complex gasand particle-phase measurements during a laboratory experiment investigating the interactions of gasoline car exhaust and ff -pinene. Specifically, we used four factor analysis techniques, namely principal component analysis (PCA), PMF, exploratory factor analysis (EFA) and non-negative matrix factorization (NMF), as well as one clustering technique, partitioning around medoids (PAM). All SDRTs were able to resolve four to five factors from the gas-phase measurements, including an alpha-pinene precursor factor, two to three oxidation product factors, and a background or car exhaust precursor factor. NMF and PMF provided an additional oxidation product factor, which was not found by other SDRTs. The results from EFA and PCA were similar after applying oblique rotations. For the particle-phase measurements, four factors were discovered with NMF: one primary factor, a mixed-LVOOA factor and two alpha-pinene secondary-organic-aerosol-derived (SOA-derived) factors. PMF was able to separate two factors: semi-volatile oxygenated organic aerosol (SVOOA) and lowvolatility oxygenated organic aerosol (LVOOA). PAM was not able to resolve interpretable clusters due to general limitations of clustering methods, as the high degree of fragmentation taking place in the aerosol mass spectrometer (AMS) causes different compounds formed at different stages in the experiment to be detected at the same variable. However, when preliminary analysis is needed, or isomers and mixed sources are not expected, cluster analysis may be a useful tool, as the results are simpler and thus easier to interpret. In the factor analysis techniques, any single ion generally contributes to multiple factors, although EFA and PCA try to minimize this spread. Our analysis shows that different SDRTs put emphasis on different parts of the data, and with only one technique, some interesting data properties may still stay undiscovered. Thus, validation of the acquired results, either by comparing between different SDRTs or applying one technique multiple times (e.g. by resampling the data or giving different starting values for iterative algorithms), is important, as it may protect the user from dismissing unexpected results as "un-physical".
引用
收藏
页码:2995 / 3022
页数:28
相关论文
共 81 条
  • [1] Resolving anthropogenic aerosol pollution types - deconvolution and exploratory classification of pollution events
    Aijala, Mikko
    Heikkinen, Liine
    Frohlich, Roman
    Canonaco, Francesco
    Prevot, Andre S. H.
    Junninen, Heikki
    Petaja, Tuukka
    Kulmala, Markku
    Worsnop, Douglas
    Ehn, Mikael
    [J]. ATMOSPHERIC CHEMISTRY AND PHYSICS, 2017, 17 (04) : 3165 - 3197
  • [2] Quantitative sampling using an Aerodyne aerosol mass spectrometer - 1. Techniques of data interpretation and error analysis
    Allan, JD
    Jimenez, JL
    Williams, PI
    Alfarra, MR
    Bower, KN
    Jayne, JT
    Coe, H
    Worsnop, DR
    [J]. JOURNAL OF GEOPHYSICAL RESEARCH-ATMOSPHERES, 2003, 108 (D3)
  • [3] Metagenes and molecular pattern discovery using matrix factorization
    Brunet, JP
    Tamayo, P
    Golub, TR
    Mesirov, JP
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (12) : 4164 - 4169
  • [4] SCREE TEST FOR NUMBER OF FACTORS
    CATTELL, RB
    [J]. MULTIVARIATE BEHAVIORAL RESEARCH, 1966, 1 (02) : 245 - 276
  • [5] Real-time measurements of ambient aerosols in a polluted Indian city: Sources, characteristics, and processing of organic aerosols during foggy and nonfoggy periods
    Chakraborty, Abhishek
    Bhattu, Deepika
    Gupta, Tarun
    Tripathi, Sachchida N.
    Canagaratna, Manjula R.
    [J]. JOURNAL OF GEOPHYSICAL RESEARCH-ATMOSPHERES, 2015, 120 (17) : 9006 - 9019
  • [6] Source apportionment of sediment PAHs in the Pearl River Delta region (China) using nonnegative matrix factorization analysis with effective weighted variance solution
    Chen, Hai-yang
    Teng, Yan-guo
    Wang, Jin-sheng
    Song, Liu-ting
    Zuo, Rui
    [J]. SCIENCE OF THE TOTAL ENVIRONMENT, 2013, 444 : 401 - 408
  • [7] Cleveland W, 1992, STAT MODELS S, P309, DOI [DOI 10.1201/9780203738535-8, 10.1201/9780203738535, DOI 10.1201/9780203738535]
  • [8] Comrey A.L., 1973, 1 COURSE FACTOR ANAL, DOI DOI 10.4324/9781315827506
  • [9] Black carbon surface oxidation and organic composition of beech-wood soot aerosols
    Corbin, J. C.
    Lohmann, U.
    Sierau, B.
    Keller, A.
    Burtscher, H.
    Mensah, A. A.
    [J]. ATMOSPHERIC CHEMISTRY AND PHYSICS, 2015, 15 (20) : 11885 - 11907
  • [10] Council NationalResearch., 1991, RETHINKING OZONE PRO