A Sampling Theory Perspective of Graph-Based Semi-Supervised Learning

被引:38
作者
Anis, Aamir [1 ,2 ]
El Gamal, Aly [3 ]
Avestimehr, A. Salman [4 ]
Ortega, Antonio [4 ]
机构
[1] Univ Southern Calif, Los Angeles, CA 90089 USA
[2] Google Inc, Mountain View, CA 94043 USA
[3] Purdue Univ, Dept Elect & Comp Engn, W Lafayette, IN 47907 USA
[4] Univ Southern Calif, Ming Hsieh Dept Elect Engn, Los Angeles, CA 90089 USA
基金
芬兰科学院;
关键词
Graph signal processing; graph sampling theory; semi-supervised learning;
D O I
10.1109/TIT.2018.2879897
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Graph-based methods have been quite successful in solving unsupervised and semi-supervised learning problems, as they provide a means to capture the underlying geometry of the dataset. It is often desirable for the constructed graph to satisfy two properties: first, data points that are similar in the feature space should be strongly connected on the graph, and second, the class label information should vary smoothly with respect to the graph, where smoothness is measured using the spectral properties of the graph Laplacian matrix. Recent works have justified some of these smoothness conditions by showing that they are strongly linked to the semi-supervised smoothness assumption and its variants. In this work, we reinforce this connection by viewing the problem from a graph sampling theoretic perspective, where class indicator functions are treated as bandlimited graph signals (in the eigenvector basis of the graph Laplacian) and label prediction as a bandlimited reconstruction problem. Our approach involves analyzing the bandwidth of class indicator signals generated from statistical data models with separable and nonseparable classes. These models are quite general and mimic the nature of most real-world datasets. Our results show that in the asymptotic limit, the bandwidth of any class indicator is also closely related to the geometry of the dataset. This allows one to theoretically justify the assumption of bandlimitedness of class indicator signals, thereby providing a sampling theoretic interpretation of graph-based semi-supervised classification.
引用
收藏
页码:2322 / 2342
页数:21
相关论文
共 32 条
[1]  
Anis Aamir, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P3864, DOI 10.1109/ICASSP.2014.6854325
[2]   Efficient Sampling Set Selection for Bandlimited Graph Signals Using Graph Spectral Proxies [J].
Anis, Aamir ;
Gadde, Akshay ;
Ortega, Antonio .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2016, 64 (14) :3775-3789
[3]  
Anis A, 2015, INT CONF ACOUST SPEE, P5461, DOI 10.1109/ICASSP.2015.7179015
[4]  
[Anonymous], THESIS
[5]  
[Anonymous], 2011, the 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
[6]  
[Anonymous], 2009, P 26 ANN INT C MACH, DOI DOI 10.1145/1553374.1553385
[7]   THE NORMALIZED GRAPH CUT AND CHEEGER CONSTANT: FROM DISCRETE TO CONTINUOUS [J].
Arias-Castro, Ery ;
Pelletier, Bruno ;
Pudlo, Pierre .
ADVANCES IN APPLIED PROBABILITY, 2012, 44 (04) :907-937
[8]   Semi-supervised learning on Riemannian manifolds [J].
Belkin, M ;
Niyogi, P .
MACHINE LEARNING, 2004, 56 (1-3) :209-239
[9]  
Belkin M., 2003, Advances in Neural Information Processing Systems, P953
[10]   Towards a theoretical foundation for Laplacian-based manifold methods [J].
Belkin, Mikhail ;
Niyogi, Partha .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2008, 74 (08) :1289-1308