Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec

被引:670
作者
Qiu, Jiezhong [1 ,2 ]
Dong, Yuxiao [2 ]
Ma, Hao [2 ]
Li, Jian [3 ]
Wang, Kuansan [2 ]
Tang, Jie [1 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China
[2] Microsoft Res, Redmond, WA USA
[3] Tsinghua Univ, Inst Interdisciplinary Informat Sci, Beijing, Peoples R China
来源
WSDM'18: PROCEEDINGS OF THE ELEVENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING | 2018年
关键词
D O I
10.1145/3159652.3159706
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Since the invention of word2vec [28, 29], the skip-gram model has significantly advanced the research of network embedding, such as the recent emergence of the DeepWalk, LINE, PTE, and node2vec approaches. In this work, we show that all of the aforementioned models with negative sampling can be unified into the matrix factorization framework with closed forms. Our analysis and proofs reveal that: (1) DeepWalk [31] empirically produces a low-rank transformation of a network's normalized Laplacian matrix; (2) LINE [37], in theory, is a special case of DeepWalk when the size of vertices' context is set to one; (3) As an extension of LINE, PTE [36] can be viewed as the joint factorization of multiple networks' Laplacians; (4) node2vec [16] is factorizing a matrix related to the stationary distribution and transition probability tensor of a 2nd-order random walk. We further provide the theoretical connections between skip-gram based network embedding algorithms and the theory of graph Laplacian. Finally, we present the NetMF method(1) as well as its approximation algorithm for computing network embedding. Our method offers significant improvements over DeepWalk and LINE for conventional network mining tasks. This work lays the theoretical foundation for skip-gram based network embedding methods, leading to a better understanding of latent network representation learning.
引用
收藏
页码:459 / 467
页数:9
相关论文
共 49 条
[1]  
[Anonymous], 2010, P 19 INT C WORLD WID, DOI DOI 10.1145/1772690.1772755
[2]  
[Anonymous], NIPS
[3]  
[Anonymous], 2016, Transactions of the Association for Computational Linguistics, DOI [10.1162/tacla00106, DOI 10.1162/TACLA00106]
[4]  
[Anonymous], 2016, IJCAI C
[5]  
[Anonymous], DATABASE
[6]  
[Anonymous], 2003, ICML
[7]  
[Anonymous], 2017, KDD
[8]  
[Anonymous], 2016, Trans. Assoc. Comput. Linguist, DOI [DOI 10.1162/TACLA00098, DOI 10.1162/TACL_A_00098]
[9]  
[Anonymous], 2009, KDD
[10]  
[Anonymous], 2015, P 24 ACM INT C INF K