Efficient Dimensionality Reduction for Sparse Binary Data

被引:0
|
作者
Pratap, Rameshwar
Kulkarni, Raghav [1 ]
Sohony, Ishan [2 ]
机构
[1] CMI, Chennai, Tamil Nadu, India
[2] SUNY Stony Brook, Stony Brook, NY 11794 USA
来源
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2018年
关键词
Dimensionality Reduction; Sketching; Binary Data; Similarity Search; Locality Sensitive Hashing;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a dimensionality reduction (sketching) algorithm for high dimensional, sparse, binary data. Our proposed algorithm provides a single sketch which simultaneously preserves multiple similarity measures including Hamming distance, Inner product, and Jaccard Similarity [12]. In contrast to the "local projection" strategy used by most of the earlier algorithms [6], [4], [7], our approach exploits sparsity and combines the following two strategies: 1. partitioning the dimensions into several buckets, 2. obtaining " global linear summaries" within those buckets. Our algorithm is faster than the existing state-of-the-art, and it preserves the binary format of the data after the dimensionality reduction, which makes the sketch space efficient. Our algorithm can also be easily adapted in streaming and incremental learning frameworks. We give a rigorous theoretical analysis of the dimensionality reduction bounds and complement it with extensive experiments. Our proposed algorithm is simple and easy to implement in practice.
引用
收藏
页码:152 / 157
页数:6
相关论文
共 50 条
  • [41] EFFICIENT DIMENSIONALITY REDUCTION FOR CANONICAL CORRELATION ANALYSIS
    Avron, Haim
    Boutsidis, Christos
    Toledo, Sivan
    Zouzias, Anastasios
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2014, 36 (05) : S111 - S131
  • [42] Biclustering Sparse Binary Genomic Data
    van Uitert, Miranda
    Meuleman, Wouter
    Wessels, Lodewyk
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2008, 15 (10) : 1329 - 1345
  • [43] Exploring combinations of dimensionality reduction, transfer learning, and regularization methods for predicting binary phenotypes with transcriptomic data
    Oshternian, S. R.
    Loipfinger, S.
    Bhattacharya, A.
    Fehrmann, R. S. N.
    BMC BIOINFORMATICS, 2024, 25 (01):
  • [44] A dimensionality reduction method based on structured sparse representation for face recognition
    Guanghua Gu
    Zhichao Hou
    Chunxia Chen
    Yao Zhao
    Artificial Intelligence Review, 2016, 46 : 431 - 443
  • [45] Simultaneous dimensionality reduction and dictionary learning for sparse representation based classification
    Bao-Qing Yang
    Chao-Chen Gu
    Kai-Jie Wu
    Tao Zhang
    Xin-Ping Guan
    Multimedia Tools and Applications, 2017, 76 : 8969 - 8990
  • [46] Sparse dimensionality reduction approaches in Mendelian randomisation with highly correlated exposures
    Karageorgiou, Vasileios
    Gill, Dipender
    Bowden, Jack
    Zuber, Verena
    ELIFE, 2023, 12
  • [47] Simultaneous dimensionality reduction and dictionary learning for sparse representation based classification
    Yang, Bao-Qing
    Gu, Chao-Chen
    Wu, Kai-Jie
    Zhang, Tao
    Guan, Xin-Ping
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (06) : 8969 - 8990
  • [48] Kernel Sparse Representation Based Dimensionality Reduction with Applications to Image Classification
    Zhang, Di
    He, Jiazhong
    Zhao, Yun
    ICIIP'18: PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION PROCESSING, 2018, : 95 - 100
  • [49] Research on Dimensionality Reduction based on Neighborhood Preserving Embedding and Sparse Representation
    Wu Di
    Zhao Zheng
    INFORMATION TECHNOLOGY FOR MANUFACTURING SYSTEMS II, PTS 1-3, 2011, 58-60 : 547 - 550
  • [50] Dimensionality reduction of hyperspectral images based on sparse discriminant manifold embedding
    Huang, Hong
    Luo, Fulin
    Liu, Jiamin
    Yang, Yaqiong
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2015, 106 : 42 - 54