node2hash: Graph aware deep semantic text hashing

被引:6
作者
Chaidaroon, Suthee [1 ]
Park, Dae Hoon [2 ]
Chang, Yi [3 ]
Fang, Yi [1 ]
机构
[1] Santa Clara Univ, Santa Clara, CA 95053 USA
[2] Huawei Res Amer, Santa Clara, CA USA
[3] Jilin Univ, Changchun, Peoples R China
关键词
Semantic hashing; Variational autoencoder; Deep learning;
D O I
10.1016/j.ipm.2019.102143
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Semantic hashing is an effective method for fast similarity search which maps high-dimensional data to a compact binary code that preserves the semantic information of the original data. Most existing text hashing approaches treat each document separately and only learn the hash codes from the content of the documents. However, in reality, documents are related to each other either explicitly through an observed linkage such as citations or implicitly through unobserved connections such as adjacency in the original space. The document relationships are pervasive in the real world while they are largely ignored in the prior semantic hashing work. In this paper, we propose node2hash, an unsupervised deep generative model for semantic text hashing by utilizing graph context. It is designed to incorporate both document content and connection information through a probabilistic formulation. Based on the deep generative modeling framework, node2hash employs deep neural networks to learn complex mappings from the original space to the hash space. Moreover, the probabilistic formulation enables a principled way to generate hash codes for unseen documents that do not have any connections with the existing documents. Besides, node2hash can go beyond one-hop connections about directed linked documents by considering more global graph information. We conduct comprehensive experiments on seven datasets with explicit and implicit connections. The results have demonstrated the effectiveness of node2hash over competitive baselines.
引用
收藏
页数:15
相关论文
共 45 条
  • [1] [Anonymous], 2016, BAYESIAN DEEP LEARNI
  • [2] [Anonymous], 2004, SOCG
  • [3] [Anonymous], 2017, ARXIV170200758
  • [4] [Anonymous], 2017, P 31 INT C NEUR INF
  • [5] [Anonymous], 2008, INTRO INFORM RETRIEV, DOI DOI 10.1017/CBO9780511809071
  • [6] [Anonymous], 2014, ARXIV 1401 4082
  • [7] Bowman SR, 2015, ARXIV
  • [8] Cao SS, 2016, AAAI CONF ARTIF INTE, P1145
  • [9] Deep Semantic Text Hashing with Weak Supervision
    Chaidaroon, Suthee
    Ebesu, Travis
    Fang, Yi
    [J]. ACM/SIGIR PROCEEDINGS 2018, 2018, : 1109 - 1112
  • [10] Variational Deep Semantic Hashing for Text Documents
    Chaidaroon, Suthee
    Fang, Yi
    [J]. SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 75 - 84