Efficient Dimensionality Reduction for Sparse Binary Data

被引:0
|
作者
Pratap, Rameshwar
Kulkarni, Raghav [1 ]
Sohony, Ishan [2 ]
机构
[1] CMI, Chennai, Tamil Nadu, India
[2] SUNY Stony Brook, Stony Brook, NY 11794 USA
来源
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2018年
关键词
Dimensionality Reduction; Sketching; Binary Data; Similarity Search; Locality Sensitive Hashing;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a dimensionality reduction (sketching) algorithm for high dimensional, sparse, binary data. Our proposed algorithm provides a single sketch which simultaneously preserves multiple similarity measures including Hamming distance, Inner product, and Jaccard Similarity [12]. In contrast to the "local projection" strategy used by most of the earlier algorithms [6], [4], [7], our approach exploits sparsity and combines the following two strategies: 1. partitioning the dimensions into several buckets, 2. obtaining " global linear summaries" within those buckets. Our algorithm is faster than the existing state-of-the-art, and it preserves the binary format of the data after the dimensionality reduction, which makes the sketch space efficient. Our algorithm can also be easily adapted in streaming and incremental learning frameworks. We give a rigorous theoretical analysis of the dimensionality reduction bounds and complement it with extensive experiments. Our proposed algorithm is simple and easy to implement in practice.
引用
收藏
页码:152 / 157
页数:6
相关论文
共 50 条
  • [31] Sparse tensor dimensionality reduction with application to clustering of functional connectivity
    Frusque, Gaetan
    Jung, Julien
    Borgnat, Pierre
    Goncalves, Paulo
    WAVELETS AND SPARSITY XVIII, 2019, 11138
  • [32] Sparse Low-Rank Preserving Projection for Dimensionality Reduction
    Liu, Zhonghua
    Wang, Jingjing
    Liu, Gang
    Pu, Jiexin
    IEEE ACCESS, 2019, 7 : 22941 - 22951
  • [33] Denoising and dimensionality reduction of genomic data
    Capobianco, E
    FLUCTUATIONS AND NOISE IN BIOLOGICAL, BIOPHYSICAL, AND BIOMEDICAL SYSTEMS III, 2005, 5841 : 69 - 80
  • [34] PCA Dimensionality Reduction for Categorical Data
    Denisiuk, Aleksander
    COMPUTATIONAL SCIENCE, ICCS 2024, PT III, 2024, 14834 : 179 - 186
  • [35] Sparse robust adaptive unsupervised subspace learning for dimensionality reduction
    Xiong, Weizhi
    Yu, Guolin
    Ma, Jun
    Liu, Sheng
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 129
  • [36] Dimensionality reduction of clustered data sets
    Sanguinetti, Guido
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2008, 30 (03) : 535 - 540
  • [37] DRESS: dimensionality reduction for efficient sequence search
    Kotsifakos, Alexios
    Stefan, Alexandra
    Athitsos, Vassilis
    Das, Gautam
    Papapetrou, Panagiotis
    DATA MINING AND KNOWLEDGE DISCOVERY, 2015, 29 (05) : 1280 - 1311
  • [38] A Review of Dimensionality Reduction Techniques for Efficient Computation
    Velliangiri, S.
    Alagumuthukrishnan, S.
    Joseph, S. Iwin Thankumar
    2ND INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ADVANCED COMPUTING ICRTAC -DISRUP - TIV INNOVATION , 2019, 2019, 165 : 104 - 111
  • [39] DRESS: dimensionality reduction for efficient sequence search
    Alexios Kotsifakos
    Alexandra Stefan
    Vassilis Athitsos
    Gautam Das
    Panagiotis Papapetrou
    Data Mining and Knowledge Discovery, 2015, 29 : 1280 - 1311
  • [40] EFFICIENT SUPERVISED DIMENSIONALITY REDUCTION FOR IMAGE CATEGORIZATION
    Benmokhtar, Rachid
    Delhumeau, Jonathan
    Gosselin, Philippe-Henri
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 2425 - 2428