Online optimization for variable selection in data streams

被引:7
|
作者
Anagnostopoulos, Christoforos [1 ]
Tasoulis, Dimitris [1 ]
Hand, David J. [1 ]
Adams, Niall M.
机构
[1] Univ London Imperial Coll Sci Technol & Med, Inst Math Sci, London SW7 2PG, England
来源
ECAI 2008, PROCEEDINGS | 2008年 / 178卷
基金
英国工程与自然科学研究理事会;
关键词
D O I
10.3233/978-1-58603-891-5-132
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Variable selection for regression is a classical statistical problem, motivated by concerns that too many covariates invite overfitting. Existing approaches notably include a class of convex optimisation techniques, such as the Lasso algorithm. Such techniques are invariably reliant on assumptions that are unrealistic in streaming contexts, namely that the data is available off-line and the correlation structure is static. In this paper, we relax both these constraints, proposing for the first time an online implementation of the Lasso algorithm with exponential forgetting. We also optimise the model dimension and the speed of forgetting in an online manner, resulting in a fully automatic scheme. In simulations our scheme improves on recursive least squares in dynamic environments, while also featuring model discovery and changepoint detection capabilities.
引用
收藏
页码:132 / +
页数:2
相关论文
共 50 条
  • [1] On Building Online Visualization Maps for News Data Streams by Means of Mathematical Optimization
    Carrizosa, Emilio
    Guerrero, Vanesa
    Hardt, Daniel
    Morales, Dolores Romero
    BIG DATA, 2018, 6 (02) : 139 - 158
  • [2] Online Bayesian Variable Selection and Bayesian Model Averaging for Streaming Data
    Ghosh, Joyee
    Tan, Aixin
    Luo, Lan
    STAT, 2025, 14 (01):
  • [3] Algorithm Selection on Data Streams
    van Rijn, Jan N.
    Holmes, Geoffrey
    Pfahringer, Bernhard
    Vanschoren, Joaquin
    DISCOVERY SCIENCE, DS 2014, 2014, 8777 : 325 - 336
  • [4] Deciding what to observe next: adaptive variable selection for regression in multivariate data streams
    Anagnostopoulos, Christoforos
    Adams, Niall M.
    Hand, David J.
    APPLIED COMPUTING 2008, VOLS 1-3, 2008, : 961 - 965
  • [5] Online clustering of parallel data streams
    Beringer, Juergen
    Huellermeier, Eyke
    DATA & KNOWLEDGE ENGINEERING, 2006, 58 (02) : 180 - 204
  • [6] Online Outlier Detection for Data Streams
    Sadik, Shiblee
    Gruenwald, Le
    PROCEEDINGS OF THE 15TH INTERNATIONAL DATABASE ENGINEERING & APPLICATIONS SYMPOSIUM (IDEAS '11), 2011, : 88 - 96
  • [7] Online One-class SVMs with Active-set Optimization for Data Streams
    Gao, Katelyn
    2015 IEEE 14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2015, : 116 - 121
  • [8] Sparse partial least squares regression for on-line variable selection with multivariate data streams
    McWilliams B.
    Montana G.
    Statistical Analysis and Data Mining, 2010, 3 (03): : 170 - 193
  • [9] Online Clustering for Evolving Data Streams with Online Anomaly Detection
    Chenaghlou, Milad
    Moshtaghi, Masud
    Leckie, Christopher
    Salehi, Mahsa
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2018, PT II, 2018, 10938 : 506 - 519
  • [10] Variable star data online
    Pickard, Roger
    Wilson, Andy
    Poyner, Gary
    ASTRONOMY & GEOPHYSICS, 2012, 53 (03) : 19 - 19