Distance metric learning for graph structured data

被引:0
作者
Tomoki Yoshida
Ichiro Takeuchi
Masayuki Karasuyama
机构
[1] Nagoya Institute of Technology,
[2] National Institute for Material Science,undefined
[3] RIKEN Center for Advanced Intelligence Project,undefined
[4] Japan Science and Technology Agency,undefined
来源
Machine Learning | 2021年 / 110卷
关键词
Metric learning; Structured data; Graph mining; Convex optimization; Interpretability;
D O I
暂无
中图分类号
学科分类号
摘要
Graphs are versatile tools for representing structured data. As a result, a variety of machine learning methods have been studied for graph data analysis. Although many such learning methods depend on the measurement of differences between input graphs, defining an appropriate distance metric for graphs remains a controversial issue. Hence, we propose a supervised distance metric learning method for the graph classification problem. Our method, named interpretable graph metric learning (IGML), learns discriminative metrics in a subgraph-based feature space, which has a strong graph representation capability. By introducing a sparsity-inducing penalty on the weight of each subgraph, IGML can identify a small number of important subgraphs that can provide insight into the given classification task. Because our formulation has a large number of optimization variables, an efficient algorithm that uses pruning techniques based on safe screening and working set selection methods is also proposed. An important property of IGML is that solution optimality is guaranteed because the problem is formulated as a convex problem and our pruning strategies only discard unnecessary subgraphs. Furthermore, we show that IGML is also applicable to other structured data such as itemset and sequence data, and that it can incorporate vertex-label similarity by using a transportation-based subgraph feature. We empirically evaluate the computational efficiency and classification performance of IGML on several benchmark datasets and provide some illustrative examples of how IGML identifies important subgraphs from a given graph dataset.
引用
收藏
页码:1765 / 1811
页数:46
相关论文
共 86 条
[1]  
Bellet A(2012)Good edit similarity learning by loss minimization Machine Learning 89 5-35
[2]  
Habrard A(2005)A network representation of protein structures: Implications for protein stability Biophysical Journal 89 4159-4170
[3]  
Sebban M(2008)LIBLINEAR: A library for large linear classification Journal of machine learning research 9 1871-1874
[4]  
Brinda K(2007)Pathwise coordinate optimization The Annals of Applied Statistics 1 302-332
[5]  
Vishveshwara S(2010)A survey of graph edit distance Pattern Analysis and applications 13 113-129
[6]  
Fan R-E(1988)A new approach to the maximum-flow problem Journal of the ACM (JACM) 35 921-940
[7]  
Chang K-W(2005)Exact and approximate graph matching using random walks IEEE transactions on pattern analysis and machine intelligence 27 1100-1111
[8]  
Hsieh C-J(2002)A simple decomposition method for support vector machines Machine Learning 46 291-314
[9]  
Wang X-R(2013)Fast and scalable approximate spectral graph matching for correspondence problems Information Sciences 220 306-318
[10]  
Lin C-J(2012)Unsupervised learning for graph matching International journal of computer vision 96 28-45