Model-based regression clustering for high-dimensional data: application to functional data

被引:13
|
作者
Devijver, Emilie [1 ]
机构
[1] Univ Paris Sud, Inria Select, Bat 425, F-91405 Orsay, France
关键词
Model-based clustering; Regression; High-dimension; Functional data; MIXTURE REGRESSION; SELECTION;
D O I
10.1007/s11634-016-0242-1
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Finite mixture regression models are useful for modeling the relationship between response and predictors arising from different subpopulations. In this article, we study high-dimensional predictors and high-dimensional response and propose two procedures to cluster observations according to the link between predictors and the response. To reduce the dimension, we propose to use the Lasso estimator, which takes into account the sparsity and a maximum likelihood estimator penalized by the rank, to take into account the matrix structure. To choose the number of components and the sparsity level, we construct a collection of models, varying those two parameters and we select a model among this collection with a non-asymptotic criterion. We extend these procedures to functional data, where predictors and responses are functions. For this purpose, we use a wavelet-based approach. For each situation, we provide algorithms and apply and evaluate our methods both on simulated and real datasets, to understand how they work in practice.
引用
收藏
页码:243 / 279
页数:37
相关论文
共 50 条
  • [1] Model-based clustering of high-dimensional data: A review
    Bouveyron, Charles
    Brunet-Saumard, Camille
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 71 : 52 - 78
  • [2] MODEL-BASED CLUSTERING OF HIGH-DIMENSIONAL DATA IN ASTROPHYSICS
    Bouveyron, C.
    STATISTICS FOR ASTROPHYSICS: CLUSTERING AND CLASSIFICATION, 2016, 77 : 91 - 119
  • [3] Variable selection for model-based high-dimensional clustering and its application to microarray data
    Wang, Sijian
    Zhu, Ji
    BIOMETRICS, 2008, 64 (02) : 440 - 448
  • [4] Model-based clustering of high-dimensional longitudinal data via regularization
    Yang, Luoying
    Wu, Tong Tong
    BIOMETRICS, 2023, 79 (02) : 761 - 774
  • [5] A Hierarchical Model-based Approach to Co-Clustering High-Dimensional Data
    Costa, Gianni
    Manco, Giuseppe
    Ortale, Riccardo
    APPLIED COMPUTING 2008, VOLS 1-3, 2008, : 886 - 890
  • [6] Model based clustering of high-dimensional binary data
    Tang, Yang
    Browne, Ryan P.
    Mc Nicholas, Paul D.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2015, 87 : 84 - 101
  • [7] Supervised model-based visualization of high-dimensional data
    Kontkanen, Petri
    Lahtinen, Jussi
    Myllymäki, Petri
    Silander, Tomi
    Tirri, Henry
    Intelligent Data Analysis, 2000, 4 (3-4) : 213 - 227
  • [8] Model-based clustering of high-dimensional data: Variable selection versus facet determination
    Poon, Leonard K. M.
    Zhang, Nevin L.
    Liu, Tengfei
    Liu, April H.
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2013, 54 (01) : 196 - 215
  • [9] Model-based clustering of high-dimensional data streams with online mixture of probabilistic PCA
    Anastasios Bellas
    Charles Bouveyron
    Marie Cottrell
    Jérôme Lacaille
    Advances in Data Analysis and Classification, 2013, 7 : 281 - 300
  • [10] Model-based clustering of high-dimensional data streams with online mixture of probabilistic PCA
    Bellas, Anastasios
    Bouveyron, Charles
    Cottrell, Marie
    Lacaille, Jerome
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2013, 7 (03) : 281 - 300