Distribution-Independent Regression for Generalized Linear Models with Oblivious Corruptions

被引：0

作者：

Diakonikolas, Ilias ^{[1
]}

Karmalkar, Sushrut ^{[1
]}

Park, Jongho ^{[2
]}

Tzamos, Christos ^{[1
]}

机构：

[1] Univ Wisconsin Madison, Madison, WI 53706 USA

[2] KRAFTON Inc, Seongnam, South Korea

来源：

THIRTY SIXTH ANNUAL CONFERENCE ON LEARNING THEORY, VOL 195 | 2023年 / 195卷

关键词：

Oblivious noise; Regression; Generalized Linear Models;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We demonstrate the first algorithms for the problem of regression for generalized linear models (GLMs) in the presence of additive oblivious noise. We assume we have sample access to examples (x, y) where y is a noisy measurement of g(w* center dot x). In particular, y = g(w* center dot x) + xi + epsilon where. is the oblivious noise drawn independently of x, satisfying Pr[xi = 0] =>= o(1), and epsilon similar to N(0, sigma(2)). Our goal is to accurately recover a function g(w center dot x) with arbitrarily small error when compared to the true values g(w* center dot x), rather than the noisy measurements y. We present an algorithm that tackles the problem in its most general distribution-independent setting, where the solution may not be identifiable. The algorithm is designed to return the solution if it is identifiable, and otherwise return a small list of candidates, one of which is close to the true solution. Furthermore, we characterize a necessary and sufficient condition for identifiability, which holds in broad settings. The problem is identifiable when the quantile at which xi + epsilon = 0 is known, or when the family of hypotheses does not contain candidates that are nearly equal to a translated g(w* center dot x) + A for some real number A, while also having large error when compared to g(w* center dot x). This is the first result for GLM regression which can handle more than half the samples being arbitrarily corrupted. Prior work focused largely on the setting of linear regression with oblivious noise, and giving algorithms under more restrictive assumptions.

引用

页数：23

共 40 条

[1] Expression and regulation of CCL18 in synovial fluid neutrophils of patients with rheumatoid arthritis [J].

Auer, Judith ;

Blaess, Markus ;

Schulze-Koops, Hendrik ;

Russwurm, Stefan ;

Nagel, Thomas ;

Kalden, Joachim R. ;

Roellinghoff, Martin ;

Beuscher, Horst Ulrich .

ARTHRITIS RESEARCH & THERAPY, 2007, 9 (05)

[2] Learning nested differences in the presence of malicious noise [J].

Auer, P .

THEORETICAL COMPUTER SCIENCE, 1997, 185 (01) :159-175

[3]

Bhatia K, 2015, ADV NEUR IN, V28

[4]

Chen H., 2022, PMLR, V178, P3905

[5]

Chen Sitan, 2020, Advances in Neural Information Processing Systems, V33, P8391

[6]

Chen Sitan, 2020, arXiv

[7]

Dalalyan AS, 2019, ADV NEUR IN, V32

[8]

Diakonikolas I., 2020, ADV NEURAL INFORM PR, V33

[9]

Diakonikolas i., 2022, P MACHINE LEARNING R, P4313

[10]

Diakonikolas I., 2023, ICML 23, DOI [10.48550/arXiv.2302.06512, DOI 10.48550/ARXIV.2302.06512.URL]

← 1 2 3 4 →