Fixation Prediction through Multimodal Analysis

被引:91
作者
Min, Xiongkuo J [1 ]
Zhai, Guangtao [1 ]
Gu, Ke [1 ]
Yang, Xiaokang [1 ]
机构
[1] Shanghai Jiao Tong Univ, Inst Image Commun & Network Engn, Shanghai Key Lab Digital Media Proc & Transmiss, Shanghai 200240, Peoples R China
基金
中国国家自然科学基金;
关键词
Audiovisual attention; multimodal analysis; saliency; eye fixation prediction; attention fusion; SALIENCY DETECTION; INFLUENCE GAZE; MODEL; ATTENTION; LOCALIZATION; FRAMEWORK; FUSION;
D O I
10.1145/2996463
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we propose to predict human eye fixation through incorporating both audio and visual cues. Traditional visual attention models generally make the utmost of stimuli's visual features, yet they bypass all audio information. In the real world, however, we not only direct our gaze according to visual saliency; but also are attracted by salient audio cues. Psychological experiments show that audio has an influence on visual attention, and subjects tend to be attracted by the sound sources. Therefbre, we propose fusing both audio and visual information to predict eye fixation. In our proposed framework, we first localize the moving-sound-generating objects through multimodal analysis and generate an audio attention map. Then, we calculate the spatial and temporal attention maps using the visual modality. Finally, the audio, spatial, and temporal attention maps are fused to generate the final audiovisual saliency map. The proposed method is applicable to scenes containing moving sound-generating objects. We gather a set of video sequences and collect eye tracking data under an audiovisual test condition. Experiment results show that we can achieve better eye fixation prediction performance when taking both audio and visual cues into consideration, especially in some typical scenes in which object motion and audio are highly correlated.
引用
收藏
页数:23
相关论文
共 50 条
  • [41] Salient Object Detection Driven by Fixation Prediction
    Wang, Wenguan
    Shen, Jianbing
    Dong, Xingping
    Borji, Ali
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 1711 - 1720
  • [42] Prediction of Human Eye Fixation by a Single Filter
    Tang, He
    Chen, Chuanbo
    Bie, Yanan
    [J]. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2017, 87 (02): : 197 - 202
  • [43] Fast curvelet transform through genetic algorithm for multimodal medical image fusion
    Arif, Muhammad
    Wang, Guojun
    [J]. SOFT COMPUTING, 2020, 24 (03) : 1815 - 1836
  • [44] Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions
    Gandhi, Ankita
    Adhvaryu, Kinjal
    Poria, Soujanya
    Cambria, Erik
    Hussain, Amir
    [J]. INFORMATION FUSION, 2023, 91 : 424 - 444
  • [45] Saliency Prediction on Mobile Videos: A Fixation Mapping-Based Dataset and A Transformer Approach
    Wen, Shijie
    Yang, Li
    Xu, Mai
    Qiao, Minglang
    Xu, Tao
    Bai, Lin
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 5935 - 5950
  • [46] Multimodal Imputation-Based Multimodal Autoencoder Framework for AQI Classification and Prediction of Indian Cities
    Rao, Routhu Srinivasa
    Kalabarige, Lakshmana Rao
    Holla, M. Raviraja
    Sahu, Aditya Kumar
    [J]. IEEE ACCESS, 2024, 12 : 108350 - 108363
  • [47] Automatic Deceit Detection Through Multimodal Analysis of High-Stake Court-Trials
    Bicer, Berat
    Dibeklioglu, Hamdi
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (01) : 342 - 356
  • [48] Exploring Multimodal Literacy through Digital Gameplay and Analysis---Chinese Adolescents' Learning Experience
    Shen, Yanan
    Ab Jalil, Habibah
    Jamaluddin, Rahimah
    [J]. EURASIAN JOURNAL OF EDUCATIONAL RESEARCH, 2024, (109): : 177 - 197
  • [49] Multimodal Deep Learning Crime Prediction Using Tweets
    Tam, Sakirin
    Tanriover, Omer Ozgur
    [J]. IEEE ACCESS, 2023, 11 : 93204 - 93214
  • [50] A Multimodal Approach for Mania Level Prediction in Bipolar Disorder
    Baki, Pnar
    Kaya, Heysem
    Ciftci, Elvan
    Gulec, Huseyin
    Salah, Albert Ali
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (04) : 2119 - 2131