HEOI: Human Attention Prediction in Natural Daily Life With Fine-Grained Human-Environment-Object Interaction Model

被引：0

作者：

Nan, Zhixiong ^{[1
]}

Jia, Leiyu ^{[1
]}

Xiao, Bin ^{[2
]}

机构：

[1] Chongqing Univ, Coll Comp Sci, Chongqing 400044, Peoples R China

[2] Chongqing Univ Posts & Telecommun, Dept Comp Sci & Technol, Chongqing 400065, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2025年 / 34卷

关键词：

Computational modeling; Predictive models; Head; Computer architecture; Computer vision; Three-dimensional displays; Resource management; Psychology; Feature extraction; Cognition; Human attention; attention prediction; multi-granularity human cues; NEURAL MECHANISMS; DRIVEN ATTENTION; VISUAL-ATTENTION; GAZE ESTIMATION; STIMULUS; MODULATION;

D O I：

10.1109/TIP.2024.3512380

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper handles the problem of human attention prediction in natural daily life from the third-person view. Due to the significance of this topic in various applications, researchers in the computer vision community have proposed many excellent models in the past few decades, and many models have begun to focus on natural daily life scenarios in recent years. However, existing mainstream models usually ignore a basic fact that human attention is a typical interdisciplinary concept. Specifically, the mainstream definition is direction-level or pixel-level, while many interdisciplinary studies argue the object-level definition. Additionally, the mainstream model structure converges to the dual-pathway architecture or its variants, while the majority of interdisciplinary studies claim attention is involved in the human-environment interaction procedure. Grounded on solid theories and studies in interdisciplinary fields including computer vision, cognition, neuroscience, psychology, and philosophy, this paper proposes a fine-grained Human-Environment-Object Interaction (HEOI) model, which for the first time integrates multi-granularity human cues to predict human attention. Our model is explainable and lightweight, and validated to be effective by a wide range of comparison, ablation, and visualization experiments on two public datasets.

引用

页码：170 / 182

页数：13

共 83 条

[1] Neural mechanisms of visual attention: Object-based selection of a region in space [J].

Arrington, CM ;

Carr, TH ;

Mayer, AR ;

Rao, SM .

JOURNAL OF COGNITIVE NEUROSCIENCE, 2000, 12 :106-117

[2]

Aung A. M., 2018, Int. Educ. Data Mining Soc., P252

[3]

Bakker S, 2016, INT J DES, V10, P1

[4] Neural Mechanisms of Object-Based Attention [J].

Baldauf, Daniel ;

Desimone, Robert .

SCIENCE, 2014, 344 (6182) :424-427

[5] OpenFace 2.0: Facial Behavior Analysis Toolkit [J].

Baltrusaitis, Tadas ;

Zadeh, Amir ;

Lim, Yao Chong ;

Morency, Louis-Philippe .

PROCEEDINGS 2018 13TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION (FG 2018), 2018, :59-66

[6] ESCNet: Gaze Target Detection with the Understanding of 3D Scenes [J].

Bao, Jun ;

Liu, Buyu ;

Yu, Jun .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :14106-14115

[7] VISUAL PARSING AND RESPONSE COMPETITION - THE EFFECT OF GROUPING FACTORS [J].

BAYLIS, GC ;

DRIVER, J .

PERCEPTION & PSYCHOPHYSICS, 1992, 51 (02) :145-162

[8] Visual search is modulated by action intentions [J].

Bekkering, H ;

Neggers, SFW .

PSYCHOLOGICAL SCIENCE, 2002, 13 (04) :370-374

[9] A MECHANICAL MODEL FOR HUMAN ATTENTION AND IMMEDIATE MEMORY [J].

BROADBENT, DE .

PSYCHOLOGICAL REVIEW, 1957, 64 (03) :205-215

[10] Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields [J].

Cao, Zhe ;

Simon, Tomas ;

Wei, Shih-En ;

Sheikh, Yaser .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1302-1310

← 1 2 3 4 5 6 7 8 9 →