Distributed Tensor Decomposition for Large Scale Health Analytics

被引:17
作者
He, Huan [1 ]
Henderson, Jette [2 ]
Ho, Joyce C. [1 ]
机构
[1] Emory Univ, Atlanta, GA 30322 USA
[2] CognitiveScale, Austin, TX USA
来源
WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019) | 2019年
基金
美国国家科学基金会;
关键词
Web Mining; User-Generated Content; Health Analytics; Tensor Decomposition; Distributed Algorithm; Apache Spark;
D O I
10.1145/3308558.3313548
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In the past few decades, there has been rapid growth in quantity and variety of healthcare data. These large sets of data are usually high dimensional (e.g. patients, their diagnoses, and medications to treat their diagnoses) and cannot be adequately represented as matrices. Thus, many existing algorithms can not analyze them. To accommodate these high dimensional data, tensor factorization, which can be viewed as a higher-order extension of methods like PCA, has attracted much attention and emerged as a promising solution. However, tensor factorization is a computationally expensive task, and existing methods developed to factor large tensors are not flexible enough for real-world situations. To address this scaling problem more efficiently, we introduce SG ran ite, a distributed, scalable, and sparse tensor factorization method fit through stochastic gradient descent. SG ran ite offers three contributions: (1) Scalability: it employs a block partitioning and parallel processing design and thus scales to large tensors, (2) Accuracy: we show that our method can achieve results faster without sacrificing the quality of the tensor decomposition, and (3) FlexibleConstraints: we show our approach can encompass various kinds of constraints including l(2) norm, l(1) norm, and logistic regularization. We demonstrate SGranite's capabilities in two real world use cases. In the first, we use Google searches for flu-like symptoms to characterize and predict influenza patterns. In the second, we use SGranite to extract clinically interesting sets (i.e., phenotypes) of patients from electronic health records. Through these case studies, we show SGranite has the potential to be used to rapidly characterize, predict, and manage a large multimodal datasets, thereby promising a novel, data-driven solution that can benefit very large segments of the population.
引用
收藏
页码:659 / 669
页数:11
相关论文
共 39 条
[31]  
Richesson R L, 2013, ACAD OUP
[32]   Tensor Decomposition for Signal Processing and Machine Learning [J].
Sidiropoulos, Nicholas D. ;
De Lathauwer, Lieven ;
Fu, Xiao ;
Huang, Kejun ;
Papalexakis, Evangelos E. ;
Faloutsos, Christos .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2017, 65 (13) :3551-3582
[33]   Magnesium carbonate is an effective phosphate binder for chronic hemodialysis patients: A pilot study [J].
Spiegel, David M. ;
Farmer, Beverly ;
Smits, Gerard ;
Chonchol, Michel .
JOURNAL OF RENAL NUTRITION, 2007, 17 (06) :416-422
[34]   Rubik: Knowledge Guided Tensor Factorization and Completion for Health Data Analytics [J].
Wang, Yichen ;
Chen, Robert ;
Ghosh, Joydeep ;
Denny, Joshua C. ;
Kho, Abel ;
Chen, You ;
Malin, Bradley A. ;
Sun, Jimeng .
KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, :1265-1274
[35]   Blood pressure lowering efficacy of beta-1 selective beta blockers for primary hypertension [J].
Wong, Gavin W. K. ;
Boyda, Heidi N. ;
Wright, James M. .
COCHRANE DATABASE OF SYSTEMATIC REVIEWS, 2016, (03)
[36]   Mining Electronic Health Records (EHRs): A Survey [J].
Yadav, Pranjul ;
Steinbach, Michael ;
Kumar, Vipin ;
Simon, Gyorgy .
ACM COMPUTING SURVEYS, 2018, 50 (06)
[37]   Epidemiological features of and changes in incidence of infectious diseases in China in the first decade after the SARS outbreak (vol 17, pg 716, 2017) [J].
Yang, S. ;
Wu, J. ;
Ding, C. .
LANCET INFECTIOUS DISEASES, 2017, 17 (09) :897-897
[38]  
Zaharia M., 2016, CAC
[39]   Regularization and variable selection via the elastic net [J].
Zou, H ;
Hastie, T .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2005, 67 :301-320