Random Walk-based Top-k Tag Generation in Bipartite Networks of Entity-Term Type

被引:0
作者
Zhang, Mingxi [1 ]
Su, Guanying [1 ]
Wang, Wei [2 ]
机构
[1] Univ Shanghai Sci & Technol, Shanghai, Peoples R China
[2] Fudan Univ, Shanghai, Peoples R China
来源
2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019) | 2019年
基金
上海市自然科学基金;
关键词
entity-term network; tag generation; random walk with restart; relevance; SEARCH;
D O I
10.1109/ICTAI.2019.00026
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Tag generation aims to find relevant tags for a given entity, which has numerous applications, such as classification, information retrieval and recommender system. Practically, the data of real applications is sparse and lacks sufficient description for entities, which might lead to incomprehensive results. Random walk with restart (RWR) can find the hidden relationship between nodes by utilizing indirect connections. However, the traditional RWR computation is based on the whole structure of the given network, which maintains a matrix for storing all relevances between nodes. And the efficiency problem would be run into as network grows large. In this paper, we propose a top-k tag generation algorithm, namely DRWR, for efficiently generating the tags from entity-term network. The terms are treated as candidate tags, and the most relevant terms are treated as the tags for a given entity. The relevance computation between entity and terms is divided into two stages: off-line stage and online stage. In off-line stage, the relevances between terms are computed over the term-term network that is built based on the whole structure of entity-term network. In on-line stage, the relevances between entity and each term are computed based on the relevances between terms. For supporting fast on-line query processing, we develop a pruning algorithm, which skips the operations on relevances between terms smaller than a threshold. Extensive experiments on real datasets demonstrate the efficiency and effectiveness of the proposed approach.
引用
收藏
页码:125 / 132
页数:8
相关论文
共 31 条
[1]  
[Anonymous], P INT C MACH LEARN
[2]   Efficient Processing of Network Proximity Queries via Chebyshev Acceleration [J].
Coskun, Mustafa ;
Grama, Ananth ;
Koyuturk, Mehmet .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :1515-1524
[3]  
El-Ghany S.A., 2019, 2019 INT C COMP INF, P1
[4]   Fast and Exact Top-k Search for Random Walk with Restart [J].
Fujiwara, Yasuhiro ;
Nakatsuji, Makoto ;
Onizuka, Makoto ;
Kitsuregawa, Masaru .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (05) :442-453
[5]  
Gemmell J, 2009, P 7 INT C INT TECHN, V528, P69
[6]  
Holbling G., 2010, P 8 EUR C INT TV VID, P273, DOI [DOI 10.1145/1809777.1809832, 10.1145/1809777.1809832.]
[7]   Cumulated gain-based evaluation of IR techniques [J].
Järvelin, K ;
Kekäläinen, J .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2002, 20 (04) :422-446
[8]  
Jeh G., 2002, P 8 ACM SIGKDD INT C, P538
[9]  
JEH G., 2003, WWW
[10]   BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart [J].
Jung, Jinhong ;
Park, Namyong ;
Sael, Lee ;
Kang, U. .
SIGMOD'17: PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2017, :789-804