A Framework for Local Supervised Dimensionality Reduction of High Dimensional Data

被引:0
作者
Aggarwal, Charu C. [1 ]
机构
[1] IBM Corp, TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
来源
PROCEEDINGS OF THE SIXTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING | 2006年
关键词
classification; dimensionality reduction;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High dimensional data presents a challenge to the classification problem because of the difficulty in modeling the precise relationship between the large number of feature variables and the class variable. In such cases, it may be desirable to reduce the information to a small number of dimensions in order to improve the accuracy and effectiveness of the classification process. While data reduction has been a well studied problem for the unsupervised domain, the technique has not been explored quite as extensively for the supervised case. Existing techniques which try to perform dimensionality reduction are too slow for practical use in the high dimensional case. These techniques try to find global discriminants in the data. However, the behavior of the data often varies considerably with data locality and different subspaces may show better discrimination in different localities. This is an even more challenging task than the global discrimination problem because of the additional issue of data localization: In this paper, we propose the novel idea of supervised subspace sampling in order to create a reduced representation of the data for classification applications in an efficient and effective way. The method exploits the natural distribution of the different classes in order to sample the best subspaces for class discrimination. Because of its sampling approach, the procedure is extremely fast and scales almost linearly both with data set size and dimensionality.
引用
收藏
页码:360 / 371
页数:12
相关论文
共 15 条
  • [1] Aggarwal C. C., 2002, ACM SIGMOD C, P452
  • [2] Aggarwal CC, 2000, SIGMOD REC, V29, P70, DOI 10.1145/335191.335383
  • [3] AGRAWAL R., 1994, PROC INT C VERY LARG, P487
  • [4] [Anonymous], PRINCIPAL COMPONENT
  • [5] Bagging predictors
    Breiman, L
    [J]. MACHINE LEARNING, 1996, 24 (02) : 123 - 140
  • [6] Chakrabarti K., 2000, VLDB C
  • [7] Chakrabarti S., 2002, Proceedings of the Twenty-eighth International Conference on Very Large Data Bases, P658
  • [8] Two variations on Fisher's linear discriminant for pattern recognition
    Cooke, T
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (02) : 268 - 273
  • [9] Duda R. O., 1973, PATTERN CLASSIFICATI
  • [10] Faloutsos C., 1995, SIGMOD Record, V24, P163, DOI 10.1145/568271.223812