Bring Order into the Samples: A Novel Scalable Method for Influence Maximization

被引：73

作者：

Wang, Xiaoyang ^{[1
]}

Zhang, Ying ^{[3
]}

Zhang, Wenjie ^{[2
]}

Lin, Xuemin ^{[2
]}

Chen, Chen ^{[1
]}

机构：

[1] Univ New South Wales, Sydney, NSW 2052, Australia

[2] Univ New South Wales, Sch Comp Sci & Engn, Sydney, NSW 2052, Australia

[3] Univ Technol Sydney, QCIS, Ultimo, NSW 2007, Australia

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2017年 / 29卷 / 02期

基金：

澳大利亚研究理事会;

关键词：

Influence maximization; bottom-k; reverse influence sampling;

D O I：

10.1109/TKDE.2016.2624734

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As a key problem in viral marketing, influence maximization has been extensively studied in the literature. Given a positive integer k, a social network G and a certain propagation model, it aims to find a set of k nodes that have the largest influence spread. The state-of-the-art method IMM is based on the reverse influence sampling (RIS) framework. By using the martingale technique, it greatly outperforms the previous methods in efficiency. However, IMM still has limitations in scalability due to the high overhead of deciding a tight sample size. In this paper, instead of spending the effort on deciding a tight sample size, we present a novel bottom-k sketch based RIS framework, namely BKRIS, which brings the order of samples into the RIS framework. By applying the sketch technique, we can derive early termination conditions to significantly accelerate the seed set selection procedure. Moreover, we provide a cost-effective method to find a proper sample size to bound the quality of returned result. In addition, we provide several optimization techniques to reduce the cost of generating samples' order and efficiently deal with the worst-case scenario. We demonstrate the efficiency and effectiveness of the proposed method over 10 real world datasets. Compared with the IMM approach, BKRIS can achieve up to two orders of magnitude speedup with almost the same influence spread. In the largest dataset with 1.8 billion edges, BKRIS can return 50 seeds in 1.3 seconds and return 5,000 seeds in 36.6 seconds. It takes IMM 55.32 second and 3,664.97 seconds, respectively.

引用

页码：243 / 256

页数：14

共 27 条

[1]

[Anonymous], 2014, P 23 ACM INT C CONFE

[2]

[Anonymous], 2011, P 20 INT C COMP WORL

[3]

[Anonymous], 2010, P 16 ACM SIGKDD INT, DOI DOI 10.1145/1835804.1835934

[4]

[Anonymous], 2003, PROC ACM SIGKDD INT

[5]

Aslay C, 2015, PROC VLDB ENDOW, V8, P822

[6]

Beyer KevinS., 2007, SIGMOD, P199

[7]

Borgs C., 2014, P 25 ANN ACM SIAM S, P946, DOI [DOI 10.1137/1.9781611973402.70, 10.1137/1.9781611973402.70]

[8] Online Topic-Aware Influence Maximization [J].

Chen, Shuo ;

Fan, Ju ;

Li, Guoliang ;

Feng, Jianhua ;

Tan, Kian-lee ;

Tang, Jinhui .

PROCEEDINGS OF THE VLDB ENDOWMENT, 2015, 8 (06) :666-677

[9] Efficient Influence Maximization in Social Networks [J].

Chen, Wei ;

Wang, Yajun ;

Yang, Siyu .

KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2009, :199-207

[10]

Cohen E, 2007, PODC'07: PROCEEDINGS OF THE 26TH ANNUAL ACM SYMPOSIUM ON PRINCIPLES OF DISTRIBUTED COMPUTING, P225

← 1 2 3 →