Fixation Prediction through Multimodal Analysis

被引:91
|
作者
Min, Xiongkuo J [1 ]
Zhai, Guangtao [1 ]
Gu, Ke [1 ]
Yang, Xiaokang [1 ]
机构
[1] Shanghai Jiao Tong Univ, Inst Image Commun & Network Engn, Shanghai Key Lab Digital Media Proc & Transmiss, Shanghai 200240, Peoples R China
基金
中国国家自然科学基金;
关键词
Audiovisual attention; multimodal analysis; saliency; eye fixation prediction; attention fusion; SALIENCY DETECTION; INFLUENCE GAZE; MODEL; ATTENTION; LOCALIZATION; FRAMEWORK; FUSION;
D O I
10.1145/2996463
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we propose to predict human eye fixation through incorporating both audio and visual cues. Traditional visual attention models generally make the utmost of stimuli's visual features, yet they bypass all audio information. In the real world, however, we not only direct our gaze according to visual saliency; but also are attracted by salient audio cues. Psychological experiments show that audio has an influence on visual attention, and subjects tend to be attracted by the sound sources. Therefbre, we propose fusing both audio and visual information to predict eye fixation. In our proposed framework, we first localize the moving-sound-generating objects through multimodal analysis and generate an audio attention map. Then, we calculate the spatial and temporal attention maps using the visual modality. Finally, the audio, spatial, and temporal attention maps are fused to generate the final audiovisual saliency map. The proposed method is applicable to scenes containing moving sound-generating objects. We gather a set of video sequences and collect eye tracking data under an audiovisual test condition. Experiment results show that we can achieve better eye fixation prediction performance when taking both audio and visual cues into consideration, especially in some typical scenes in which object motion and audio are highly correlated.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Fixation Prediction through Multimodal Analysis
    Min, Xiongkuo
    Zhai, Guangtao
    Hu, Chunjia
    Gu, Ke
    2015 VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2015,
  • [2] Stochastic bottom-up fixation prediction and saccade generation
    Tavakoli, Hamed Rezazadegan
    Rahtu, Esa
    Heikkila, Janne
    IMAGE AND VISION COMPUTING, 2013, 31 (09) : 686 - 693
  • [3] Multimodal Analysis and Prediction of Persuasiveness in Online Social Multimedia
    Park, Sunghyun
    Shim, Han Suk
    Chatterjee, Moitreya
    Sagae, Kenji
    Morency, Louis-Philippe
    ACM TRANSACTIONS ON INTERACTIVE INTELLIGENT SYSTEMS, 2016, 6 (03)
  • [4] Understanding Low- and High-Level Contributions to Fixation Prediction
    Kuemmerer, Matthias
    Wallis, Thomas S. A.
    Gatys, Leon A.
    Bethge, Matthias
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4799 - 4808
  • [5] Adversarial Multimodal Representation Learning for Click-Through Rate Prediction
    Li, Xiang
    Wang, Chao
    Tan, Jiwei
    Zeng, Xiaoyi
    Ou, Dan
    Zheng, Bo
    WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020), 2020, : 827 - 836
  • [6] Exploring Latent Constructs through Multimodal Data Analysis
    Wang, Shiyu
    Wu, Shushan
    Chen, Yinghan
    Fang, Luyang
    Xiao, Liang
    Li, Feiming
    JOURNAL OF EDUCATIONAL MEASUREMENT, 2024,
  • [7] A Review of Key Technologies for Emotion Analysis Using Multimodal Information
    Zhu, Xianxun
    Guo, Chaopeng
    Feng, Heyang
    Huang, Yao
    Feng, Yichen
    Wang, Xiangyang
    Wang, Rui
    COGNITIVE COMPUTATION, 2024, 16 (04) : 1504 - 1530
  • [8] Learning a Combined Model of Visual Saliency for Fixation Prediction
    Wang, Jingwei
    Borji, Ali
    Kuo, C. -C. Jay
    Itti, Laurent
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (04) : 1566 - 1579
  • [9] An Image Statistics-Based Model for Fixation Prediction
    Yanulevskaya, Victoria
    Marsman, Jan Bernard
    Cornelissen, Frans
    Geusebroek, Jan-Mark
    COGNITIVE COMPUTATION, 2011, 3 (01) : 94 - 104
  • [10] Fixation Prediction and Visual Priority Maps for Biped Locomotion
    Anantrasirichai, Nantheera
    Daniels, Katherine A. J.
    Burn, Jeremy F.
    Gilchrist, Iain D.
    Bull, David R.
    IEEE TRANSACTIONS ON CYBERNETICS, 2018, 48 (08) : 2294 - 2306