Online Censoring for Large-Scale Regressions with Application to Streaming Big Data

被引：46

作者：

Berberidis, Dimitris ^{[1
]}

Kekatos, Vassilis ^{[2
]}

Giannakis, Georgios B. ^{[1
]}

机构：

[1] Univ Minnesota, Elect & Comp Engn Dept, Minneapolis, MN 55455 USA

[2] Virginia Tech, Elect & Comp Engn Dept, Blacksburg, VA 24061 USA

来源：

IEEE TRANSACTIONS ON SIGNAL PROCESSING | 2016年 / 64卷 / 15期

基金：

美国国家科学基金会;

关键词：

Parameter estimation; least squares; stochastic approximation; big data; support vector machines; data reduction; censoring; LMS; RLS; random projections; ALGORITHM;

D O I：

10.1109/TSP.2016.2546225

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

On par with data-intensive applications, the sheer size of modern linear regression problems creates an ever-growing demand for efficient solvers. Fortunately, a significant percentage of the data accrued can be omitted while maintaining a certain quality of statistical inference with an affordable computational budget. This work introduces means of identifying and omitting less informative observations in an online and data-adaptive fashion. Given streaming data, the related maximum-likelihood estimator is sequentially found using first-and second-order stochastic approximation algorithms. These schemes are well suited when data are inherently censored or when the aim is to save communication overhead in decentralized learning setups. In a different operational scenario, the task of joint censoring and estimation is put forth to solve large-scale linear regressions in a centralized setup. Novel online algorithms are developed enjoying simple closed-form updates and provable (non) asymptotic convergence guarantees. To attain desired censoring patterns and levels of dimensionality reduction, thresholding rules are investigated too. Numerical tests on real and synthetic datasets corroborate the efficacy of the proposed data-adaptive methods compared to data-agnostic random projection-based alternatives.

引用

页码：3854 / 3867

页数：14

共 37 条

[1]

Agaskar A, 2014, 2014 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), P389, DOI 10.1109/GlobalSIP.2014.7032145

[2] TOBIT MODELS - A SURVEY [J].

AMEMIYA, T .

JOURNAL OF ECONOMETRICS, 1984, 24 (1-2) :3-61

[3] REGRESSION-ANALYSIS WHEN DEPENDENT VARIABLE IS TRUNCATED NORMAL [J].

AMEMIYA, T .

ECONOMETRICA, 1973, 41 (06) :997-1016

[4]

[Anonymous], 1993, OPTIMAL DESIGN EXPT

[5]

[Anonymous], 2013, Advances in neural information processing systems (NIPS)

[6]

[Anonymous], ELEMENTS STAT LEARNI

[7]

[Anonymous], 2015, Convex Optimization Algorithms

[8]

Bach, 2011, ADV NEURAL INFORM PR, P451

[9]

Bache K., 2013, UCI Machine Learning Repository

[10] RECURSIVE STATE ESTIMATION FOR A SET-MEMBERSHIP DESCRIPTION OF UNCERTAINTY [J].

BERTSEKAS, DP ;

RHODES, IB .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1971, AC16 (02) :117-+

← 1 2 3 4 →