Continual Learning of Generative Models With Limited Data: From Wasserstein-1 Barycenter to Adaptive Coalescence

被引：2

作者：

Dedeoglu, Mehmet ^{[1
]}

Lin, Sen ^{[2
]}

Zhang, Zhaofeng ^{[1
]}

Zhang, Junshan ^{[3
]}

机构：

[1] Arizona State Univ, Sch Elect Comp & Energy Engn, Tempe, AZ 85281 USA

[2] Ohio State Univ, AI EDGE Inst, Columbus, OH 43210 USA

[3] Univ Calif Davis, Dept Elect & Comp Engn, Davis, CA 95616 USA

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2024年 / 35卷 / 09期

关键词：

Adaptation models; Data models; Computational modeling; Optimization; Solid modeling; Task analysis; Servers; Continual learning; generative adversarial networks (GANs); optimal transport theory; Wasserstein barycenters; OPTIMAL-TRANSPORT;

D O I：

10.1109/TNNLS.2023.3251096

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Learning generative models is challenging for a network edge node with limited data and computing power. Since tasks in similar environments share a model similarity, it is plausible to leverage pretrained generative models from other edge nodes. Appealing to optimal transport theory tailored toward Wasserstein-1 generative adversarial networks (WGANs), this study aims to develop a framework that systematically optimizes continual learning of generative models using local data at the edge node while exploiting adaptive coalescence of pretrained generative models. Specifically, by treating the knowledge transfer from other nodes as Wasserstein balls centered around their pretrained models, continual learning of generative models is cast as a constrained optimization problem, which is further reduced to a Wasserstein-1 barycenter problem. A two-stage approach is devised accordingly: 1) the barycenters among the pretrained models are computed offline, where displacement interpolation is used as the theoretic foundation for finding adaptive barycenters via a "recursive" WGAN configuration and 2) the barycenter computed offline is used as metamodel initialization for continual learning, and then, fast adaptation is carried out to find the generative model using the local samples at the target edge node. Finally, a weight ternarization method, based on joint optimization of weights and threshold for quantization, is developed to compress the generative model further. Extensive experimental studies corroborate the effectiveness of the proposed framework.

引用

页码：12042 / 12056

页数：15

共 59 条

[1] BARYCENTERS IN THE WASSERSTEIN SPACE [J].

Agueh, Martial ;

Carlier, Guillaume .

SIAM JOURNAL ON MATHEMATICAL ANALYSIS, 2011, 43 (02) :904-924

[2]

Ambrosio L, 2008, LECT MATH, P1

[3] Discrete Wasserstein barycenters: optimal transport for discrete data [J].

Anderes, Ethan ;

Borgwardt, Steffen ;

Miller, Jacob .

MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2016, 84 (02) :389-409

[4]

[Anonymous], 2020, IEEE INTERNET THINGS, V7, P4505

[5]

Arjovsky S., 2017, ARXIV

[6]

Arora S, 2017, PR MACH LEARN RES, V70

[7] Distribution's template estimate with Wasserstein metrics [J].

Boissard, Emmanuel ;

Le Gouic, Thibaut ;

Loubes, Jean-Michel .

BERNOULLI, 2015, 21 (02) :740-759

[8] Random forests [J].

Breiman, L .

MACHINE LEARNING, 2001, 45 (01) :5-32

[9] POLAR FACTORIZATION AND MONOTONE REARRANGEMENT OF VECTOR-VALUED FUNCTIONS [J].

BRENIER, Y .

COMMUNICATIONS ON PURE AND APPLIED MATHEMATICS, 1991, 44 (04) :375-417

[10]

Chen L, 2020, ICML, P1542

← 1 2 3 4 5 6 →