Multimodal Local-Global Attention Network for Affective Video Content Analysis

被引：41

作者：

Ou, Yangjun ^{[1
]}

Chen, Zhenzhong ^{[1
]}

Wu, Feng ^{[2
]}

机构：

[1] Wuhan Univ, Sch Remote Sensing & Informat Engn, Wuhan 430079, Peoples R China

[2] Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei 230027, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2021年 / 31卷 / 05期

关键词：

Visualization; Task analysis; Psychology; Feature extraction; Hidden Markov models; Analytical models; Brain modeling; Affective content analysis; multimodal learning; attention; EMOTION RECOGNITION; MODEL; REPRESENTATION; INTEGRATION; DATABASE;

D O I：

10.1109/TCSVT.2020.3014889

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

With the rapid development of video distribution and broadcasting, affective video content analysis has attracted a lot of research and development activities recently. Predicting emotional responses of movie audiences is a challenging task in affective computing, since the induced emotions can be considered relatively subjective. In this article, we propose a multimodal local-global attention network (MMLGAN) for affective video content analysis. Inspired by the multimodal integration effect, we extend the attention mechanism to multi-level fusion and design a multimodal fusion unit to obtain a global representation of affective video. The multimodal fusion unit selects key parts from multimodal local streams in the local attention stage and captures the information distribution across time in the global attention stage. Experiments on the LIRIS-ACCEDE dataset, the MediaEval 2015 and 2016 datasets, the FilmStim dataset, the DEAP dataset and the VideoEmotion dataset demonstrate the effectiveness of our approach when compared with the state-of-the-art methods.

引用

页码：1901 / 1914

页数：14

共 82 条

[1] Abu-El-Haija, 2016, ARXIV
[2] A comprehensive study on mid-level representation and ensemble learning for emotional analysis of video material
Acar, Esra
Hopfgartner, Frank
Albayrak, Sahin
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (09) : 11809 - 11837
[3] Anastasia T., 2016, P MEDIAEVAL WORKSH, P20
[4] [Anonymous], 2015, CEUR WORKSH
[5] [Anonymous], 2015, P MEDIAEVAL WORKSH
[6] [Anonymous], 2015, P MEDIAEVAL WORKSH
[7] Affective Level Video Segmentation by Utilizing the Pleasure-Arousal-Dominance Information
Arifin, Sutjipto
Cheung, Peter Y. K.
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2008, 10 (07) : 1325 - 1341
[8] Deep Sentiment Features of Context and Faces for Affective Video Analysis
Baecchi, Claudio
Uricchio, Tiberio
Bertini, Marco
Del Bimbo, Alberto
[J]. PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR'17), 2017, : 72 - 77
[9] Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[10] Affective Video Content Analysis: A Multidisciplinary Insight
Baveye, Yoann
Chamaret, Christel
Dellandrea, Emmanuel
Chen, Liming
[J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2018, 9 (04) : 396 - 409

← 1 2 3 4 5 6 7 8 9 →