CIPCA: Complete-Information-based Principal Component Analysis for interval-valued data

被引:62
作者
Wang, Huiwen [1 ]
Guan, Rong [1 ]
Wu, Junjie [1 ]
机构
[1] Beihang Univ, Sch Econ & Management, Beijing 100191, Peoples R China
基金
中国国家自然科学基金;
关键词
Principal Component Analysis (PCA); Interval-valued Data; Complete-Information-based Principal Component Analysis (CIPCA);
D O I
10.1016/j.neucom.2012.01.018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Principal Component Analysis (PCA) has long been used as a tool in exploratory data analysis and for making predictive models. Recent years have witnessed the continuous emergence of huge-volume data from various computerized industries, which triggers the call for more efficient and effective PCA methods. In light of this, in this paper, we work on interval-valued data and propose a new PCA method called CIPCA. CIPCA discriminates itself from various well-established methods, e.g., VPCA and CPCA, in that it can capture the complete information in interval-valued observations. Taking a hypercube view with infinitely dense points uniformly distributed within the hypercubes, CIPCA defines the inner product of interval-valued variables, and transforms the PCA modeling into the computation of some inner products in the covariance matrix. Both comparative experiments with VPCA and CPCA on the synthetic data sets and applications on real-world data demonstrate the merits of CIPCA in modeling interval-valued data. In particular, CIPCA provides an efficient and effective way for conducting PCA for large-scaled numerical data, and can find the meaningful structure information hidden in massive data. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:158 / 169
页数:12
相关论文
共 29 条
[1]  
[Anonymous], 2006, Symbolic Data Analysis: Conceptual Statistics and Data Mining
[2]  
[Anonymous], 1997, Revue Statistique appliquee
[3]   Max-Min Distance Analysis by Using Sequential SDP Relaxation for Dimension Reduction [J].
Bian, Wei ;
Tao, Dacheng .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (05) :1037-1050
[4]   From the statistics of data to the statistics of knowledge: Symbolic data analysis [J].
Billard, L ;
Diday, E .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2003, 98 (462) :470-487
[5]  
Bock H.H., 2000, Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data
[6]   A least squares approach to principal component analysis for interval valued data [J].
D'Urso, P ;
Giordani, P .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2004, 70 (02) :179-192
[7]  
Diday E., 2008, Symbolic data analysis and the SODAS software
[8]  
Drew ME., 2003, Australian Journal of Management, V28, P119
[9]  
Eun CheolS., 2007, PAC-BASIN FINANC J, V15, P452
[10]   DAML: Domain Adaptation Metric Learning [J].
Geng, Bo ;
Tao, Dacheng ;
Xu, Chao .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2011, 20 (10) :2980-2989