GRAIN: Improving Data Efficiency of Graph Neural Networks via Diversified Influence Maximization

被引：19

作者：

Zhang, Wentao ^{[1
,2
,3
]}

Yang, Zhi ^{[1
,2
,4
,5
]}

Wang, Yexin ^{[1
,2
]}

Shen, Yu ^{[1
,2
]}

Li, Yang ^{[1
,2
]}

Wang, Liang ^{[1
,2
]}

Cui, Bin ^{[1
,2
,4
,5
]}

机构：

[1] Peking Univ, Sch EECS, Beijing, Peoples R China

[2] Peking Univ, Key Lab High Confidence Software Technol, Beijing, Peoples R China

[3] Tencent Inc, Shenzhen, Peoples R China

[4] Peking Univ, Ctr Data Sci, Beijing, Peoples R China

[5] Natl Engn Lab Big Data Anal & Applicat, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE VLDB ENDOWMENT | 2021年 / 14卷 / 11期

关键词：

D O I：

10.14778/3476249.3476295

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Data selection methods, such as active learning and core-set selection, are useful tools for improving the data efficiency of deep learning models on large-scale datasets. However, recent deep learning models have moved forward from independent and identically distributed data to graph-structured data, such as social networks, e-commerce user-item graphs, and knowledge graphs. This evolution has led to the emergence of Graph Neural Networks (GNNs) that go beyond the models existing data selection methods are designed for. Therefore, we present Grain, an efficient framework that opens up a new perspective through connecting data selection in GNNs with social influence maximization. By exploiting the common patterns of GNNs, Grain introduces a novel feature propagation concept, a diversified influence maximization objective with novel influence and diversity functions, and a greedy algorithm with an approximation guarantee into a unified framework. Empirical studies on public datasets demonstrate that Grain significantly improves both the performance and efficiency of data selection (including active learning and core-set selection) for GNNs. To the best of our knowledge, this is the first attempt to bridge two largely parallel threads of research, data selection, and social influence maximization, in the setting of GNNs, paving new ways for improving data efficiency.

引用

页码：2473 / 2482

页数：10

共 62 条

[1]

Aggarwal CC, 2014, CH CRC DATA MIN KNOW, P457

[2]

[Anonymous], 2014, Advances in Neural Information Processing Systems

[3]

[Anonymous], 2009, COMPUTER SCI TECHNIC

[4]

[Anonymous], 2005, Advances in neural information processing systems

[5]

Bilgic Mustafa, 2010, P 27 INT C MACHINE L

[6] From Community Detection to Community Profiling [J].

Cai, Hongyun ;

Zheng, Vincent W. ;

Zhu, Fanwei ;

Chang, Kevin Chen-Chuan ;

Huang, Zi .

PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 10 (07) :817-828

[7]

Campbell T, 2018, PR MACH LEARN RES, V80

[8]

Chen L., 2021, 9 INT C LEARNING REP

[9]

Chen M., 2020, ADV NEURAL INFORM PR

[10] How Do the Open Source Communities Address Usability and UX Issues? An Exploratory Study [J].

Cheng, Jinghui ;

Guo, Jin L. C. .

CHI 2018: EXTENDED ABSTRACTS OF THE 2018 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2018,

← 1 2 3 4 5 6 7 →