An Information-Theoretic Analysis of the Impact of Task Similarity on Meta-Learning

被引:10
作者
Jose, Sharu Theresa [1 ]
Simeone, Osvaldo [1 ]
机构
[1] Kings Coll London, Dept Engn, Kings Commun Learning & Informat Proc KCLIP Lab, London, England
来源
2021 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT) | 2021年
基金
欧洲研究理事会;
关键词
D O I
10.1109/ISIT45174.2021.9517767
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Meta-learning aims at optimizing the hyperparameters of a model class or training algorithm from the observation of data from a number of related tasks. Following the setting of Baxter [1], the tasks are assumed to belong to the same task environment, which is defined by a distribution over the space of tasks and by per-task data distributions. The statistical properties of the task environment thus dictate the similarity of the tasks. The goal of the meta-learner is to ensure that the hyperparameters obtain a small loss when applied for training of a new task sampled from the task environment. The difference between the resulting average loss, known as meta-population loss, and the corresponding empirical lass measured on the available data from related tasks, known as meta-generalization gap, is a measure of the generalization capability of the meta-learner. In this paper, we present novel information-theoretic bounds on the average absolute value of the meta-generalization gap. Unlike prior work [2], our bounds explicitly capture the impact of task relatedness, the number of tasks, and the number of data samples per task on the meta-generalization gap. Task similarity is gauged via the Kullback-Leibler (KL) and Jensen-Shannon (JS) divergences. We illustrate the proposed bounds on the example of ridge regression with meta-learned bias.
引用
收藏
页码:1534 / 1539
页数:6
相关论文
共 29 条
  • [1] Aminian G., 2020, IEEE Information Theory Workshop
  • [2] AMIT R, 2018, INT C MACH LEARN, P205
  • [3] A model of inductive bias learning
    Baxter, J
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2000, 12 : 149 - 198
  • [4] Ben-David S., 2007, NEURIPS, P137
  • [5] Bu YH, 2019, IEEE INT SYMP INFO, P587, DOI [10.1109/isit.2019.8849590, 10.1109/ISIT.2019.8849590]
  • [6] Denevi G., 2018, P 34 C UNC ART INT U
  • [7] Denevi G., 2019, PR MACH LEARN RES, P1566
  • [8] Denevi G., 2020, Advances in Neural Information Processing Systems, P964
  • [9] Finn C, 2017, PR MACH LEARN RES, V70
  • [10] Jose S. T., 2020, ARXIV201009484