A multivariate linear regression analysis using finite mixtures of t distributions

被引:33
作者
Galimberti, Giuliano [1 ]
Soffritti, Gabriele [1 ]
机构
[1] Univ Bologna, Dept Stat Sci, I-40126 Bologna, Italy
关键词
EM algorithm; Maximum likelihood; Model identifiability; Non-normal error distribution; Unobserved heterogeneity; DISCRIMINANT-ANALYSIS; EM ALGORITHM; MODEL; IDENTIFIABILITY; LIKELIHOOD;
D O I
10.1016/j.csda.2013.01.017
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Recently, finite mixture models have been used to model the distribution of the error terms in multivariate linear regression analysis. In particular, Gaussian mixture models have been employed. A novel approach that assumes that the error terms follow a finite mixture oft distributions is introduced. This assumption allows for an extension of multivariate linear regression models, making these models more versatile and robust against the presence of outliers in the error term distribution. The issues of model identifiability and maximum likelihood estimation are addressed. In particular, identifiability conditions are provided and an Expectation-Maximisation algorithm for estimating the model parameters is developed. Properties of the estimators of the regression coefficients are evaluated through Monte Carlo experiments and compared to the estimators from the Gaussian mixture models. Results from the analysis of two real datasets are presented. (C) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:138 / 150
页数:13
相关论文
共 44 条
[1]   Extending mixtures of multivariate t-factor analyzers [J].
Andrews, Jeffrey L. ;
McNicholas, Paul D. .
STATISTICS AND COMPUTING, 2011, 21 (03) :361-373
[2]  
[Anonymous], 2004, Multivariate T Distributions and Their Applications
[3]  
[Anonymous], 2008, EM ALGORITHM EXTENSI
[4]  
[Anonymous], 504 U WASH DEP STAT
[5]  
Azzalini A., 2011, R PACKAGE SN SKEW NO
[6]   MODEL-BASED GAUSSIAN AND NON-GAUSSIAN CLUSTERING [J].
BANFIELD, JD ;
RAFTERY, AE .
BIOMETRICS, 1993, 49 (03) :803-821
[7]   The use of mixtures for dealing with non-normal regression errors [J].
Bartolucci, F ;
Scaccia, L .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2005, 48 (04) :821-834
[8]   Acceleration of the EM algorithm: P-EM versus epsilon algorithm [J].
Berlinet, A. F. ;
Roland, Ch .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2012, 56 (12) :4122-4137
[9]   Assessing a mixture model for clustering with the integrated completed likelihood [J].
Biernacki, C ;
Celeux, G ;
Govaert, G .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2000, 22 (07) :719-725
[10]   Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models [J].
Biernacki, C ;
Celeux, G ;
Govaert, G .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2003, 41 (3-4) :561-575