Efficient Dimensionality Reduction for Sparse Binary Data

被引:0
|
作者
Pratap, Rameshwar
Kulkarni, Raghav [1 ]
Sohony, Ishan [2 ]
机构
[1] CMI, Chennai, Tamil Nadu, India
[2] SUNY Stony Brook, Stony Brook, NY 11794 USA
来源
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2018年
关键词
Dimensionality Reduction; Sketching; Binary Data; Similarity Search; Locality Sensitive Hashing;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a dimensionality reduction (sketching) algorithm for high dimensional, sparse, binary data. Our proposed algorithm provides a single sketch which simultaneously preserves multiple similarity measures including Hamming distance, Inner product, and Jaccard Similarity [12]. In contrast to the "local projection" strategy used by most of the earlier algorithms [6], [4], [7], our approach exploits sparsity and combines the following two strategies: 1. partitioning the dimensions into several buckets, 2. obtaining " global linear summaries" within those buckets. Our algorithm is faster than the existing state-of-the-art, and it preserves the binary format of the data after the dimensionality reduction, which makes the sketch space efficient. Our algorithm can also be easily adapted in streaming and incremental learning frameworks. We give a rigorous theoretical analysis of the dimensionality reduction bounds and complement it with extensive experiments. Our proposed algorithm is simple and easy to implement in practice.
引用
收藏
页码:152 / 157
页数:6
相关论文
共 50 条
  • [1] Clustering and Dimensionality Reduction to Discover Interesting Patterns in Binary Data
    Palumbo, Francesco
    D'Enza, Alfonso Iodice
    ADVANCES IN DATA ANALYSIS, DATA HANDLING AND BUSINESS INTELLIGENCE, 2010, : 45 - +
  • [2] Dimensionality Reduction for Categorical Data
    Bera, Debajyoti
    Pratap, Rameshwar
    Verma, Bhisham Dev
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (04) : 3658 - 3671
  • [3] Dimensionality reduction for binary data through the projection of natural parameters
    Landgraf, Andrew J.
    Lee, Yoonkyung
    JOURNAL OF MULTIVARIATE ANALYSIS, 2020, 180
  • [4] Category Guided Sparse Preserving Projection for Biometric Data Dimensionality Reduction
    Huang, Qianying
    Wu, Yunsong
    Zhao, Chenqiu
    Zhang, Xiaohong
    Yang, Dan
    Biometric Recognition, 2016, 9967 : 539 - 546
  • [5] Sparse kernel entropy component analysis for dimensionality reduction of biomedical data
    Shi, Jun
    Jiang, Qikun
    Zhang, Qi
    Huang, Qinghua
    Li, Xuelong
    NEUROCOMPUTING, 2015, 168 : 930 - 940
  • [6] Dimensionality reduction for regularization of sparse data-driven RANS simulations
    Piroozmand, Pasha
    Brenner, Oliver
    Jenny, Patrick
    JOURNAL OF COMPUTATIONAL PHYSICS, 2023, 492
  • [7] Unsupervised dimensionality reduction versus supervised regularization for classification from sparse data
    Jessica Clark
    Foster Provost
    Data Mining and Knowledge Discovery, 2019, 33 : 871 - 916
  • [8] Unsupervised dimensionality reduction versus supervised regularization for classification from sparse data
    Clark, Jessica
    Provost, Foster
    DATA MINING AND KNOWLEDGE DISCOVERY, 2019, 33 (04) : 871 - 916
  • [9] Geometric multidimensional scaling: efficient approach for data dimensionality reduction
    Dzemyda, Gintautas
    Sabaliauskas, Martynas
    JOURNAL OF GLOBAL OPTIMIZATION, 2024, 88 (01) : 53 - 77
  • [10] Geometric multidimensional scaling: efficient approach for data dimensionality reduction
    Gintautas Dzemyda
    Martynas Sabaliauskas
    Journal of Global Optimization, 2024, 88 : 53 - 77