GRAIN: Improving Data Efficiency of Graph Neural Networks via Diversified Influence Maximization

被引:19
作者
Zhang, Wentao [1 ,2 ,3 ]
Yang, Zhi [1 ,2 ,4 ,5 ]
Wang, Yexin [1 ,2 ]
Shen, Yu [1 ,2 ]
Li, Yang [1 ,2 ]
Wang, Liang [1 ,2 ]
Cui, Bin [1 ,2 ,4 ,5 ]
机构
[1] Peking Univ, Sch EECS, Beijing, Peoples R China
[2] Peking Univ, Key Lab High Confidence Software Technol, Beijing, Peoples R China
[3] Tencent Inc, Shenzhen, Peoples R China
[4] Peking Univ, Ctr Data Sci, Beijing, Peoples R China
[5] Natl Engn Lab Big Data Anal & Applicat, Beijing, Peoples R China
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2021年 / 14卷 / 11期
关键词
D O I
10.14778/3476249.3476295
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data selection methods, such as active learning and core-set selection, are useful tools for improving the data efficiency of deep learning models on large-scale datasets. However, recent deep learning models have moved forward from independent and identically distributed data to graph-structured data, such as social networks, e-commerce user-item graphs, and knowledge graphs. This evolution has led to the emergence of Graph Neural Networks (GNNs) that go beyond the models existing data selection methods are designed for. Therefore, we present Grain, an efficient framework that opens up a new perspective through connecting data selection in GNNs with social influence maximization. By exploiting the common patterns of GNNs, Grain introduces a novel feature propagation concept, a diversified influence maximization objective with novel influence and diversity functions, and a greedy algorithm with an approximation guarantee into a unified framework. Empirical studies on public datasets demonstrate that Grain significantly improves both the performance and efficiency of data selection (including active learning and core-set selection) for GNNs. To the best of our knowledge, this is the first attempt to bridge two largely parallel threads of research, data selection, and social influence maximization, in the setting of GNNs, paving new ways for improving data efficiency.
引用
收藏
页码:2473 / 2482
页数:10
相关论文
共 62 条
[11]   Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks [J].
Chiang, Wei-Lin ;
Liu, Xuanqing ;
Si, Si ;
Li, Yang ;
Bengio, Samy ;
Hsieh, Cho-Jui .
KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, :257-266
[12]   Adaptive Graph Encoder for Attributed Graph Embedding [J].
Cui, Ganqu ;
Zhou, Jie ;
Yang, Cheng ;
Liu, Zhiyuan .
KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, :976-985
[13]  
Dai ZY, 2013, 2013 IEEE INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING WORKSHOPS (ISSREW), P1, DOI 10.1109/ISSREW.2013.6688847
[14]  
Dey D., 2013, ROBOTICS SCI SYSTEMS
[15]  
Ding Shengchun S, 2020, WORLD J PEDIATR, P103, DOI DOI 10.1007/s12519-020-00344-6
[16]  
Frasca Fabrizio, 2020, ICML 2020 WORKSH GRA
[17]  
Gal Y, 2017, PR MACH LEARN RES, V70
[18]  
Golovin D, 2011, J ARTIF INTELL RES, V42, P427
[19]  
Hamilton WL, 2017, ADV NEUR IN, V30
[20]   Smaller coresets for k-median and k-means clustering [J].
Har-Peled, Sariel ;
Kushal, Akash .
DISCRETE & COMPUTATIONAL GEOMETRY, 2007, 37 (01) :3-19