TSI-GNN: Extending Graph Neural Networks to Handle Missing Data in Temporal Settings

被引：7

作者：

Gordon, David ^{[1
,2
]}

Petousis, Panayiotis ^{[3
]}

Zheng, Henry ^{[2
]}

Zamanzadeh, Davina ^{[2
]}

Bui, Alex A. T. ^{[1
,2
]}

机构：

[1] Univ Calif Los Angeles, Dept Bioengn, Los Angeles, CA 90095 USA

[2] Univ Calif Los Angeles, Dept Radiol Sci, Med & Imaging Informat MII Grp, Los Angeles, CA 90095 USA

[3] UCLA, Clin & Translat Sci Inst, Los Angeles, CA USA

来源：

FRONTIERS IN BIG DATA | 2021年 / 4卷

关键词：

missing data; imputation; temporal data; irregular sampling; deep learning; graph neural networks; DATA IMPUTATION;

D O I：

10.3389/fdata.2021.693869

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We present a novel approach for imputing missing data that incorporates temporal information into bipartite graphs through an extension of graph representation learning. Missing data is abundant in several domains, particularly when observations are made over time. Most imputation methods make strong assumptions about the distribution of the data. While novel methods may relax some assumptions, they may not consider temporality. Moreover, when such methods are extended to handle time, they may not generalize without retraining. We propose using a joint bipartite graph approach to incorporate temporal sequence information. Specifically, the observation nodes and edges with temporal information are used in message passing to learn node and edge embeddings and to inform the imputation task. Our proposed method, temporal setting imputation using graph neural networks (TSI-GNN), captures sequence information that can then be used within an aggregation function of a graph neural network. To the best of our knowledge, this is the first effort to use a joint bipartite graph approach that captures sequence information to handle missing data. We use several benchmark datasets to test the performance of our method against a variety of conditions, comparing to both classic and contemporary methods. We further provide insight to manage the size of the generated TSI-GNN model. Through our analysis we show that incorporating temporal information into a bipartite graph improves the representation at the 30% and 60% missing rate, specifically when using a nonlinear model for downstream prediction tasks in regularly sampled datasets and is competitive with existing temporal methods under different scenarios.

引用

页数：9

共 44 条

[1]

Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265

[2]

Ahmed Amr, 2013, WWW, P37

[3]

Beaulieu-Jones BK, 2017, BIOCOMPUT-PAC SYM, P207, DOI 10.1142/9789813207813_0021

[4]

Berg R. v. d., 2017, arXiv preprint arXiv:1706.02263

[5]

Bhattarai A., 2018, MISSINGPY MISSING DA

[6] Exact Matrix Completion via Convex Optimization [J].

Candes, Emmanuel J. ;

Recht, Benjamin .

FOUNDATIONS OF COMPUTATIONAL MATHEMATICS, 2009, 9 (06) :717-772

[7]

Cao W, 2018, ADV NEUR IN, V31

[8] Recurrent Neural Networks for Multivariate Time Series with Missing Values [J].

Che, Zhengping ;

Purushotham, Sanjay ;

Cho, Kyunghyun ;

Sontag, David ;

Liu, Yan .

SCIENTIFIC REPORTS, 2018, 8

[9]

Fey Matthias, 2019, P INT C LEARN REPR

[10]

Fortuin V, 2020, PR MACH LEARN RES, V108, P1651

← 1 2 3 4 5 →