Networked Exponential Families for Big Data Over Networks

被引:7
作者
Jung, Alexander [1 ]
机构
[1] Aalto Univ, Dept Comp Sci, FI-00076 Aalto, Finland
关键词
Data models; TV; Big Data; Probabilistic logic; Machine learning; Optimization; Message passing; Big data; networks; statistical machine learning; federated learning; privacy-preserving machine learning; lasso; ALGORITHMS; REGRESSION; MODEL;
D O I
10.1109/ACCESS.2020.3033817
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The data generated in many application domains can be modeled as big data over networks, i.e., massive collections of high-dimensional local datasets related via an intrinsic network structure. Machine learning for big data over networks must jointly leverage the information contained in the local datasets and their network structure. We propose networked exponential families as a novel probabilistic modeling framework for machine learning from big data over networks. We interpret the high-dimensional local datasets as the realizations of a random process distributed according to some exponential family. Networked exponential families allow us to jointly leverage the information contained in local datasets and their network structure in order to learn a tailored model for each local dataset. We formulate the task of learning the parameters of networked exponential families as a convex optimization problem. This optimization problem is an instance of the network Lasso and enforces a data-driven pooling (or clustering) of the local datasets according to their corresponding parameters for the exponential family. We derive an upper bound on the estimation error of network Lasso. This upper bound depends on the network structure and the information geometry of the node-wise exponential families. These insights provided by this bound can be used for determining how much data needs to be collected or observed to ensure network Lasso to be accurate. We also provide a scalable implementation of the network Lasso as a message-passing between adjacent local datasets. Such message passing is appealing for federated machine learning relying on edge computing. We finally note that the proposed method is also privacy-preserving because no raw data but only parameter (estimates) are shared among different nodes.
引用
收藏
页码:202897 / 202909
页数:13
相关论文
共 52 条
[1]  
Ambos H, 2018, CONF REC ASILOMAR C, P855, DOI 10.1109/ACSSC.2018.8645260
[2]  
[Anonymous], 2014, Convex Optimiza- tion
[3]  
[Anonymous], 2019, Statistical learning with sparsity: the lasso and generalizations
[4]  
[Anonymous], 2016, Big Data over Networks
[5]  
Arora S, 2016, PR MACH LEARN RES, V48
[6]   Network medicine: a network-based approach to human disease [J].
Barabasi, Albert-Laszlo ;
Gulbahce, Natali ;
Loscalzo, Joseph .
NATURE REVIEWS GENETICS, 2011, 12 (01) :56-68
[7]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[8]  
Boyd S., 2010, DISTRIBUTED OPTIMIZA, V3
[9]  
Bühlmann P, 2011, SPRINGER SER STAT, P1, DOI 10.1007/978-3-642-20192-9
[10]   A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging [J].
Chambolle, Antonin ;
Pock, Thomas .
JOURNAL OF MATHEMATICAL IMAGING AND VISION, 2011, 40 (01) :120-145