Tensor envelope mixture model for simultaneous clustering and multiway dimension reduction

被引:8
作者
Deng, Kai [1 ]
Zhang, Xin [1 ]
机构
[1] Florida State Univ, Dept Stat, Tallahassee, FL 32306 USA
基金
美国国家科学基金会;
关键词
clustering; dimension reduction; envelope; mixture models; tensor data analysis; MAXIMUM-LIKELIHOOD; VARIABLE SELECTION; GENE-EXPRESSION; ALGORITHM;
D O I
10.1111/biom.13486
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
In the form of multidimensional arrays, tensor data have become increasingly prevalent in modern scientific studies and biomedical applications such as computational biology, brain imaging analysis, and process monitoring system. These data are intrinsically heterogeneous with complex dependencies and structure. Therefore, ad-hoc dimension reduction methods on tensor data may lack statistical efficiency and can obscure essential findings. Model-based clustering is a cornerstone of multivariate statistics and unsupervised learning; however, existing methods and algorithms are not designed for tensor-variate samples. In this article, we propose a tensor envelope mixture model (TEMM) for simultaneous clustering and multiway dimension reduction of tensor data. TEMM incorporates tensor-structure-preserving dimension reduction into mixture modeling and drastically reduces the number of free parameters and estimative variability. An expectation-maximization-type algorithm is developed to obtain likelihood-based estimators of the cluster means and covariances, which are jointly parameterized and constrained onto a series of lower dimensional subspaces known as the tensor envelopes. We demonstrate the encouraging empirical performance of the proposed method in extensive simulation studies and a real data application in comparison with existing vector and tensor clustering methods.
引用
收藏
页码:1067 / 1079
页数:13
相关论文
共 44 条
[1]  
Arthur David., 2007, P 18 ANN AC M SIAM S, P1027, DOI DOI 10.1145/1283383.1283494
[2]   Mixtures of Factor Analyzers with Common Factor Loadings: Applications to the Clustering and Visualization of High-Dimensional Data [J].
Baek, Jangsun ;
McLachlan, Geoffrey J. ;
Flack, Lloyd K. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2010, 32 (07) :1298-1309
[3]   MODEL-BASED GAUSSIAN AND NON-GAUSSIAN CLUSTERING [J].
BANFIELD, JD ;
RAFTERY, AE .
BIOMETRICS, 1993, 49 (03) :803-821
[4]   Transcription-based prediction of response to IFNβ using supervised computational methods [J].
Baranzini, SE ;
Mousavi, P ;
Rio, J ;
Caillier, SJ ;
Stillman, A ;
Villoslada, P ;
Wyatt, MM ;
Comabella, M ;
Greller, LD ;
Somogyi, R ;
Montalban, X ;
Oksenberg, JR .
PLOS BIOLOGY, 2005, 3 (01) :166-176
[5]   MULTILAYER TENSOR FACTORIZATION WITH APPLICATIONS TO RECOMMENDER SYSTEMS [J].
Bi, Xuan ;
Qu, Annie ;
Shen, Xiaotong .
ANNALS OF STATISTICS, 2018, 46 (6B) :3308-3333
[6]   High-dimensional data clustering [J].
Bouveyron, C. ;
Girard, S. ;
Schmid, C. .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (01) :502-519
[7]  
Cook R.D., 2018, INTRO ENVELOPES DIME
[8]   FAST ENVELOPE ALGORITHMS [J].
Cook, R. Dennis ;
Zhang, Xin .
STATISTICA SINICA, 2018, 28 (03) :1179-1197
[9]   Foundations for Envelope Models and Methods [J].
Cook, R. Dennis ;
Zhang, Xin .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2015, 110 (510) :599-611
[10]  
Cook RD, 2010, STAT SINICA, V20, P927