Motion-Appearance Co-Memory Networks for Video Question Answering

被引：186

作者：

Gao, Jiyang ^{[1
]}

Ge, Runzhou ^{[1
]}

Chen, Kan ^{[1
]}

Nevatia, Ram ^{[1
]}

机构：

[1] Univ Southern Calif, Los Angeles, CA 90089 USA

来源：

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2018年

关键词：

D O I：

10.1109/CVPR.2018.00688

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video Question Answering (QA) is an important task in understanding video temporal structure. We observe that there are three unique attributes of video QA compared with image QA: (1) it deals with long sequences of images containing richer information not only in quantity but also in variety; (2) motion and appearance information are usually correlated with each other and able to provide useful attention cues to the other; (3) different questions require different number of frames to infer the answer. Based on these observations, we propose a motion-appearance co-memory network for video QA. Our networks are built on concepts from Dynamic Memory Network (DMN) and introduces new mechanisms for video QA. Specifically, there are three salient aspects: (1) a co-memory attention mechanism that utilizes cues from both motion and appearance to generate attention; (2) a temporal conv-deconv network to generate multi-level contextual facts; (3) a dynamic fact ensemble method to construct temporal representation dynamically for different questions. We evaluate our method on TGIF-QA dataset, and the results outperform state-of-the-art significantly on all four tasks of TGIF-QA.

引用

页码：6576 / 6585

页数：10

共 40 条

[1] Neural Module Networks [J].

Andreas, Jacob ;

Rohrbach, Marcus ;

Darrell, Trevor ;

Klein, Dan .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :39-48

[2]

[Anonymous], 2016, IEEE C COMP VIS PATT

[3]

[Anonymous], ICCV

[4]

[Anonymous], 2017, ICCV

[5]

[Anonymous], 2017, IJCAI

[6]

[Anonymous], 2017, ICCV

[7]

[Anonymous], 2017, ICCV

[8]

[Anonymous], 2017, ICCV

[9]

[Anonymous], 2016, P INT C MACHINE LEAR

[10]

[Anonymous], 2016, P IEEE C COMP VIS PA

← 1 2 3 4 →