VMemNet: A Deep Collaborative Spatial-Temporal Network With Attention Representation for Video Memorability Prediction

被引：2

作者：

Lu, Wei ^{[1
]}

Zhai, Yujia ^{[1
]}

Han, Jiaze ^{[1
]}

Jing, Peiguang ^{[1
,2
]}

Liu, Yu ^{[3
]}

Su, Yuting ^{[1
]}

机构：

[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China

[2] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541004, Peoples R China

[3] Tianjin Univ, Sch Microelect, Tianjin 300072, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2024年 / 26卷

关键词：

Video memorability; Attention mechanism; Spatial-temporal features; MEMORY; MODELS;

D O I：

10.1109/TMM.2023.3327861

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Video memorability measures the degree to which a video is remembered by different viewers and has shown great potential in various contexts, including advertising, education, and health care. While extensive research has been conducted on image memorability, the study of video memorability is still in its early stages. Existing methods in this field primarily focus on coarse-grained spatial feature representation and decision fusion strategies, overlooking the crucial interactions between spatial and temporal domains. Therefore, we propose an end-to-end collaborative spatial-temporal network called VMemNet, which incorporates targeted attention mechanisms and intermediation fusion strategies. This enables VMemNet to capture the intricate relationships between spatial and temporal information and uncover more elements of memorability within video visual features. VMemNet integrates spatially and semantically guided attention modules into a dual-stream network architecture, allowing it to simultaneously capture static local cues and dynamic global cues in videos. Specifically, the spatial attention module is used to aggregate more memorable elements from spatial locations, and the semantically guided attention module is used to achieve semantic alignment and intermediate fusion of the local and global cues. In addition, two types of loss functions with complementary decision rules are associated with the corresponding attention modules to guide the training process of the proposed network. Experimental results obtained on a publicly available dataset verify that the proposed VMemNet approach outperforms all current single- and multi-modal methods in terms of video memorability prediction.

引用

页码：4926 / 4937

页数：12

共 61 条

[1]

Agarla M., 2023, PROC MEDIAEVAL MULTI, V1

[2]

Azcona D., 2020, PROC CEUR WORKSHOP

[3]

Bainbridge W. A., 2017, Alzheimer's Dement., V13, pP287

[4] Memorability of photographs in subjective cognitive decline and mild cognitive impairment: Implications for cognitive assessment [J].

Bainbridge, Wilma A. ;

Berron, David ;

Schuetze, Hartmut ;

Cardenas-Blanco, Arturo ;

Metzger, Coraline ;

Dobisch, Laura ;

Bittner, Daniel ;

Glanz, Wenzel ;

Spottke, Annika ;

Rudolph, Janna ;

Brosseron, Frederic ;

Buerger, Katharina ;

Janowitz, Daniel ;

Fliessbach, Klaus ;

Heneka, Michael ;

Laske, Christoph ;

Buchmann, Martina ;

Peters, Oliver ;

Diesing, Dominik ;

Li, Siyao ;

Priller, Josef ;

Spruth, Eike Jakob ;

Altenstein, Slawek ;

Schneider, Anja ;

Kofler, Barbara ;

Teipel, Stefan ;

Kilimann, Ingo ;

Wiltfang, Jens ;

Bartels, Claudia ;

Wolfsgruber, Steffen ;

Wagner, Michael ;

Jessen, Frank ;

Baker, Chris I. ;

Duezel, Emrah .

ALZHEIMER'S & DEMENTIA: DIAGNOSIS, ASSESSMENT & DISEASE MONITORING, 2019, 11 (01) :610-618

[5] The Intrinsic Memorability of Face Photographs [J].

Bainbridge, Wilma A. ;

Isola, Phillip ;

Oliva, Aude .

JOURNAL OF EXPERIMENTAL PSYCHOLOGY-GENERAL, 2013, 142 (04) :1323-1334

[6] Recurrent and Dynamic Models for Predicting Streaming Video Quality of Experience [J].

Bampis, Christos G. ;

Li, Zhi ;

Katsavounidis, Ioannis ;

Bovik, Alan C. .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (07) :3316-3331

[7] Deep Learning for Image Memorability Prediction: the Emotional Bias [J].

Baveye, Yoann ;

Cohendet, Romain ;

Da Silva, Matthieu Perreira ;

Le Callet, Patrick .

MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, 2016, :491-495

[8] Memorable tourist experiences versus ordinary tourist experiences analysed through user-generated content [J].

Bigne, Enrique ;

Lilibeth Fuentes-Medina, Maria ;

Morini-Marrero, Sandra .

JOURNAL OF HOSPITALITY AND TOURISM MANAGEMENT, 2020, 45 :309-318

[9] When retailers and manufacturers advertise together; examining the effect of co-operative advertising on ad reach and memorability [J].

Cathy Nguyen ;

Romaniuk, Jenni ;

Cohen, Justin ;

Faulkner, Margaret .

JOURNAL OF RETAILING AND CONSUMER SERVICES, 2020, 55

[10]

Cho Kyunghyun., 2014, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing EMNLP, P1724

← 1 2 3 4 5 6 7 →