A preliminary geometric structure simplification for Principal Component Analysis

被引:2
作者
Gu, Huamao [1 ]
Lin, Tong [2 ]
Wang, Xun [1 ]
机构
[1] Zhejiang Gongshang Univ, Sch Comp Sci & Informat Engn, Hangzhou 310018, Zhejiang, Peoples R China
[2] Peking Univ, Sch Elect Engn & Comp Sci, State Key Lab Machine Percept, Beijing 100871, Peoples R China
基金
美国国家科学基金会;
关键词
Data preprocessing; PCA; Geometric structure; DIMENSIONALITY REDUCTION; EIGENMAPS;
D O I
10.1016/j.neucom.2018.05.119
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Real world data are commonly geometrically nonlinear and thus are not easy to be processed by the traditional linear methods. Many existing techniques for nonlinear dimensionality reduction need careful parameter tuning and cannot be applied to real data stably and consistently. In this article we propose an efficient data preprocessing algorithm, called Curve Straightening Transformation (CST), to flatten the nonlinear geometric structure of data. Then Principal Component Analysis (PCA) and other linear projection methods are adequate to perform the dimensionality reduction task in most cases. In this aspect, the proposed CST algorithm can be regarded as a geometric preprocessing step tailored for PCA. The comprehensive experiments on both artificial and real datasets demonstrate that the proposed preprocessing algorithm is able to simplify the nonlinear geometric structures, and the flattened data are suitable for further dimensionality reduction by linear methods such as PCA. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:46 / 55
页数:10
相关论文
共 23 条
[1]  
[Anonymous], 2009, J Mach Learn Res
[2]  
[Anonymous], 1975, International Perspectives on Mathematical and Statistical Modeling, DOI DOI 10.1016/B978-0-12-103950-9.50017-4
[3]  
Bache K., UCI machine learning repository
[4]   Laplacian eigenmaps for dimensionality reduction and data representation [J].
Belkin, M ;
Niyogi, P .
NEURAL COMPUTATION, 2003, 15 (06) :1373-1396
[5]  
Brierley P., MATLAB NEURAL NETWOR
[6]  
Cai D., 4 FACE DATABASES MAT
[7]   An Unsupervised Approach for Person Name Bipolarization Using Principal Component Analysis [J].
Chen, Chien Chin ;
Chen, Zhong-Yong ;
Wu, Chen-Yuan .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (11) :1963-1976
[8]   Mining incomplete data with singleton, subset and concept probabilistic approximations [J].
Clark, Patrick G. ;
Grzymala-Busse, Jerzy W. ;
Rzasa, Wojciech .
INFORMATION SCIENCES, 2014, 280 :368-384
[9]   Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data [J].
Donoho, DL ;
Grimes, C .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (10) :5591-5596
[10]   Reducing the dimensionality of data with neural networks [J].
Hinton, G. E. ;
Salakhutdinov, R. R. .
SCIENCE, 2006, 313 (5786) :504-507