Semi-Supervised Linear Regression

被引：28

作者：

Azriel, David ^{[1
]}

Brown, Lawrence D. ^{[2
]}

Sklar, Michael ^{[3
]}

Berk, Richard ^{[2
]}

Buja, Andreas ^{[2
]}

Zhao, Linda ^{[2
]}

机构：

[1] Technion Israel Inst Technol, Haifa, Israel

[2] Univ Penn, Wharton Sch, Dept Stat, Philadelphia, PA 19104 USA

[3] Stanford Univ, Dept Stat, Stanford, CA 94305 USA

来源：

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION | 2022年 / 117卷 / 540期

关键词：

Linear regression; Misspecified models; Semi-supervised learning; INFERENCE; EFFICIENT;

D O I：

10.1080/01621459.2021.1915320

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

We study a regression problem where for some part of the data we observe both the label variable (Y) and the predictors (X), while for other part of the data only the predictors are given. Such a problem arises, for example, when observations of the label variable are costly and may require a skilled human agent. When the conditional expectation E[Y vertical bar X] is not exactly linear, one can consider the best linear approximation to the conditional expectation, which can be estimated consistently by the least-square estimates (LSE). The latter depends only on the labeled data. We suggest improved alternative estimates to the LSE that use also the unlabeled data. Our estimation method can be easily implemented and has simply described asymptotic properties. The new estimates asymptotically dominate the usual standard procedures under certain non-linearity condition of E[Y vertical bar X]; otherwise, they are asymptotically equivalent. The performance of the new estimator for small sample size is investigated in an extensive simulation study. A real data example of inferring homeless population is used to illustrate the new methodology.

引用

页码：2238 / 2251

页数：14

共 24 条

[1]

[Anonymous], 1974, Theoretical Statistics

[2]

[Anonymous], 2008, Advances in Neural Information Processing Systems

[3] The conditionality principle in high-dimensional regression [J].