DIMENSIONALITY REDUCTION OF HIGH-DIMENSIONAL DATA WITH A NONLINEAR PRINCIPAL COMPONENT ALIGNED GENERATIVE TOPOGRAPHIC MAPPING

被引:2
作者
Griebel, M. [1 ]
Hullmann, A. [1 ]
机构
[1] Univ Bonn, Inst Numer Simulat, D-53115 Bonn, Germany
关键词
dimensionality reduction; generative topographic mapping; principal component analysis; density estimation; additive model; classification; EM ALGORITHM;
D O I
10.1137/130931382
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Most high-dimensional real-life data exhibit some dependencies such that data points do not populate the whole data space but lie approximately on a lower-dimensional manifold. A major problem in many data mining applications is the detection of such a manifold and the expression of the given data in terms of a moderate number of latent variables. We present a method which is derived from the generative topographic mapping (GTM) and can be seen as a nonlinear generalization of the principal component analysis (PCA). It can detect certain nonlinearities in the data but does not suffer from the curse of dimensionality with respect to the latent space dimension as the original GTM and thus allows for higher embedding dimensions. We provide experiments that show that our approach leads to an improved data reconstruction compared to the purely linear PCA and that it can furthermore be used for classification.
引用
收藏
页码:A1027 / A1047
页数:21
相关论文
共 24 条
  • [1] [Anonymous], 1961, Adaptive Control Processes: a Guided Tour, DOI DOI 10.1515/9781400874668
  • [2] Bache K., 2013, UCI Machine Learning Repository
  • [3] Developments of the generative topographic mapping
    Bishop, CM
    Svensén, M
    Williams, CKI
    [J]. NEUROCOMPUTING, 1998, 21 (1-3) : 203 - 224
  • [4] GTM: The generative topographic mapping
    Bishop, CM
    Svensen, M
    Williams, CKI
    [J]. NEURAL COMPUTATION, 1998, 10 (01) : 215 - 234
  • [5] BUJA A, 1989, ANN STAT, V17, P453, DOI 10.1214/aos/1176347115
  • [6] INDEPENDENT COMPONENT ANALYSIS, A NEW CONCEPT
    COMON, P
    [J]. SIGNAL PROCESSING, 1994, 36 (03) : 287 - 314
  • [7] Modeling wine preferences by data mining from physicochemical properties
    Cortez, Paulo
    Cerdeira, Antonio
    Almeida, Fernando
    Matos, Telmo
    Reis, Jose
    [J]. DECISION SUPPORT SYSTEMS, 2009, 47 (04) : 547 - 553
  • [8] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
    DEMPSTER, AP
    LAIRD, NM
    RUBIN, DB
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
  • [9] Dias Daniel B., 2009, Proceedings 2009 International Joint Conference on Neural Networks (IJCNN 2009 - Atlanta), P697, DOI 10.1109/IJCNN.2009.5178917
  • [10] PROJECTION PURSUIT REGRESSION
    FRIEDMAN, JH
    STUETZLE, W
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1981, 76 (376) : 817 - 823