A scalable supervised algorithm for dimensionality reduction on streaming data

被引:19
作者
Yan, Jun
Zhang, Benyu
Yan, Shuicheng
Liu, Ning
Yang, Qiang
Cheng, Qiansheng
Li, Hua
Chen, Zheng
Ma, Wei-Ying
机构
[1] Microsoft Res Asia, Beijing 100080, Peoples R China
[2] Peking Univ, Sch Math Sci, Dept Informat Sci, LMAM, Beijing 100871, Peoples R China
[3] Tsinghua Univ, Dept Math, Beijing 100084, Peoples R China
[4] Hong Kong Univ Sci & Technol, Dept Comp Sci, Hong Kong, Hong Kong, Peoples R China
[5] Peking Univ, Sch Math Sci, Dept Math, Beijing 100871, Peoples R China
关键词
dimensionality reduction; streaming data; principal component analysis (PCA); linear discriminant analysis (LDA); maximum margin criterion (MMC);
D O I
10.1016/j.ins.2005.11.005
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Algorithms on streaming data have attracted increasing attention in the past decade. Among them, dimensionality reduction algorithms are greatly interesting due to the desirability of real tasks. Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) are two of the most widely used dimensionality reduction approaches. However, PCA is not optimal for general classification problems because it is unsupervised and ignores valuable label information for classification. On the other hand, the performance of LDA is degraded when encountering limited available low-dimensional spaces and singularity problem. Recently, Maximum Margin Criterion (MMC) was proposed to overcome the shortcomings of PCA and LDA. Nevertheless, the original MMC algorithm could not satisfy the streaming data model to handle large-scale high-dimensional data set. Thus an effective, efficient and scalable approach is needed. In this paper, we propose a supervised incremental dimensionality reduction algorithm and its extension to infer adaptive low-dimensional spaces by optimizing the maximum margin criterion. Experimental results on a synthetic dataset and real datasets demonstrate the superior performance of our proposed algorithm on streaming data. (C) 2005 Elsevier Inc. All rights reserved.
引用
收藏
页码:2042 / 2065
页数:24
相关论文
共 28 条
[1]  
[Anonymous], P 10 ACM SIGKDD INT
[2]  
[Anonymous], 2003, P 20 INT C MACH LEAR
[3]  
ARTAE M, 2002, P 16 INT C PAT REC Q
[4]  
BABCOCK B, 2002, P ACM PODS
[5]  
Balakrishnama S., 1998, LINEAR DISCRIMINANT
[6]  
Blake C.L., 1998, UCI repository of machine learning databases
[7]  
BRODER A, 2002, P ALL C
[8]  
Charikar M., 2002, P ICALP
[9]  
GIBBONS P, 2001, DISTINCT SAMPLING HI
[10]  
GILBERT AC, 2001, VLDB J, P79