COPA: Constrained PARAFAC2 for Sparse & Large Datasets

被引:25
作者
Afshar, Ardavan [1 ]
Perros, Ioakeim [1 ]
Papalexakis, Evangelos E. [2 ]
Searles, Elizabeth [3 ]
Ho, Joyce [4 ]
Sun, Jimeng [1 ]
机构
[1] Georgia Inst Technol, Atlanta, GA 30332 USA
[2] Univ Calif Riverside, Riverside, CA 92521 USA
[3] Childrens Healthcare Atlanta, Atlanta, GA USA
[4] Emory Univ, Atlanta, GA 30322 USA
来源
CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT | 2018年
基金
美国国家科学基金会;
关键词
Tensor Factorization; Unsupervised Learning; Computational Phenotyping;
D O I
10.1145/3269206.3271775
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
PARAFAC2 has demonstrated success in modeling irregular tensors, where the tensor dimensions vary across one of the modes. An example scenario is modeling treatments across a set of patients with the varying number of medical encounters over time. Despite recent improvements on unconstrained PARAFAC2, its model factors are usually dense and sensitive to noise which limits their interpretability. As a result, the following open challenges remain: a) various modeling constraints, such as temporal smoothness, sparsity and non-negativity, are needed to be imposed for interpretable temporal modeling and b) a scalable approach is required to support those constraints efficiently for large datasets. To tackle these challenges, we propose a COnstrained PARAFAC2 (COPA) method, which carefully incorporates optimization constraints such as temporal smoothness, sparsity, and non-negativity in the resulting factors. To efficiently support all those constraints, COPA adopts a hybrid optimization framework using alternating optimization and alternating direction method of multiplier (AO-ADMM). As evaluated on large electronic health record (EHR) datasets with hundreds of thousands of patients, COPA achieves significant speedups (up to 36x faster) over prior PARAFAC2 approaches that only attempt to handle a subset of the constraints that COPA enables. Overall, our method outperforms all the baselines attempting to handle a subset of the constraints in terms of speed, while achieving the same level of accuracy. Through a case study on temporal phenotyping of medically complex children, we demonstrate how the constraints imposed by COPA reveal concise phenotypes and meaningful temporal profiles of patients. The clinical interpretation of both the phenotypes and the temporal profiles was confirmed by a medical expert.
引用
收藏
页码:793 / 802
页数:10
相关论文
共 29 条
[1]   Link Prediction on Evolving Data using Matrix and Tensor Factorizations [J].
Acar, Evrim ;
Dunlavy, Daniel M. ;
Kolda, Tamara G. .
2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009), 2009, :262-+
[2]   CP-ORTHO: An Orthogonal Tensor Factorization Framework for Spatio-Temporal Data [J].
Afshar, Ardavan ;
Ho, Joyce C. ;
Dilkina, Bistra ;
Perros, Ioakeim ;
Khalil, Elias B. ;
Xiong, Li ;
Sunderam, Vaidy .
25TH ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS (ACM SIGSPATIAL GIS 2017), 2017,
[3]  
[Anonymous], 2017, CLIN CLASS SOFTW CCS
[4]  
[Anonymous], 1972, UCLA Working Papers in Phonetics
[5]   Efficient MATLAB computations with sparse and factored tensors [J].
Bader, Brett W. ;
Kolda, Tamara G. .
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2007, 30 (01) :205-231
[6]  
Bro R, 1999, J CHEMOMETR, V13, P295, DOI 10.1002/(SICI)1099-128X(199905/08)13:3/4<295::AID-CEM547>3.0.CO
[7]  
2-Y
[8]   ANALYSIS OF INDIVIDUAL DIFFERENCES IN MULTIDIMENSIONAL SCALING VIA AN N-WAY GENERALIZATION OF ECKART-YOUNG DECOMPOSITION [J].
CARROLL, JD ;
CHANG, JJ .
PSYCHOMETRIKA, 1970, 35 (03) :283-&
[9]  
Chew PA, 2007, KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P143
[10]   NEW APPROACH TO LAGRANGE MULTIPLIERS. [J].
Clarke, Frank H. .
Mathematics of Operations Research, 1976, 1 (02) :165-174