node2hash: Graph aware deep semantic text hashing

被引:6
作者
Chaidaroon, Suthee [1 ]
Park, Dae Hoon [2 ]
Chang, Yi [3 ]
Fang, Yi [1 ]
机构
[1] Santa Clara Univ, Santa Clara, CA 95053 USA
[2] Huawei Res Amer, Santa Clara, CA USA
[3] Jilin Univ, Changchun, Peoples R China
关键词
Semantic hashing; Variational autoencoder; Deep learning;
D O I
10.1016/j.ipm.2019.102143
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Semantic hashing is an effective method for fast similarity search which maps high-dimensional data to a compact binary code that preserves the semantic information of the original data. Most existing text hashing approaches treat each document separately and only learn the hash codes from the content of the documents. However, in reality, documents are related to each other either explicitly through an observed linkage such as citations or implicitly through unobserved connections such as adjacency in the original space. The document relationships are pervasive in the real world while they are largely ignored in the prior semantic hashing work. In this paper, we propose node2hash, an unsupervised deep generative model for semantic text hashing by utilizing graph context. It is designed to incorporate both document content and connection information through a probabilistic formulation. Based on the deep generative modeling framework, node2hash employs deep neural networks to learn complex mappings from the original space to the hash space. Moreover, the probabilistic formulation enables a principled way to generate hash codes for unseen documents that do not have any connections with the existing documents. Besides, node2hash can go beyond one-hop connections about directed linked documents by considering more global graph information. We conduct comprehensive experiments on seven datasets with explicit and implicit connections. The results have demonstrated the effectiveness of node2hash over competitive baselines.
引用
收藏
页数:15
相关论文
共 45 条
  • [21] Numerical and Experimental Investigation into Hot Forming of Ultra High Strength Steel Sheet
    Liu, Hongsheng
    Liu, Wei
    Bao, Jun
    Xing, Zhongwen
    Song, Baoyu
    Lei, Chengxi
    [J]. JOURNAL OF MATERIALS ENGINEERING AND PERFORMANCE, 2011, 20 (01) : 1 - 10
  • [22] Liu W., 2014, Advances in Neural Information Processing Systems, P3419
  • [23] Liu W, 2012, PROC CVPR IEEE, P2074, DOI 10.1109/CVPR.2012.6247912
  • [24] Mikolov T, 2013, NEURAL INFORM PROCES, V3111, P3119
  • [25] Morin Frederic, 2005, PMLR, P246
  • [26] Park D. H., 2018, ARXIV181104155
  • [27] DeepWalk: Online Learning of Social Representations
    Perozzi, Bryan
    Al-Rfou, Rami
    Skiena, Steven
    [J]. PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, : 701 - 710
  • [28] Deep Semantic Hashing with Generative Adversarial Networks
    Qiu, Zhaofan
    Pan, Yingwei
    Yao, Ting
    Mei, Tao
    [J]. SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 225 - 234
  • [29] Semantic hashing
    Salakhutdinov, Ruslan
    Hinton, Geoffrey
    [J]. INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2009, 50 (07) : 969 - 978
  • [30] Shen FM, 2015, PROC CVPR IEEE, P37, DOI 10.1109/CVPR.2015.7298598