Human-object interaction detection based on disentangled axial attention transformer

被引：0

作者：

Xia, Limin ^{[1
]}

Xiao, Qiyue ^{[1
]}

机构：

[1] Cent South Univ, Sch Automat, Changsha 410083, Peoples R China

来源：

MACHINE VISION AND APPLICATIONS | 2024年 / 35卷 / 04期

关键词：

Human-object interaction dection; Transformer; Disentanglement strategy; Axial attention;

D O I：

10.1007/s00138-024-01558-8

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Human-object interaction (HOI) detection aims to localize and infer interactions between human and objects in an image. Recent work proposed transformer encoder-decoder architectures for HOI detection with exceptional performance, but possess certain drawbacks: they do not employ a complete disentanglement strategy to learn more discriminative features for different sub-tasks; they cannot achieve sufficient contextual exchange within each branch, which is crucial for accurate relational reasoning; their transformer models suffer from high computational costs and large memory usage due to complex attention calculations. In this work, we propose a disentangled transformer network that disentangles both the encoder and decoder into three branches for human detection, object detection, and interaction classification. Then we propose a novel feature unify decoder to associate the predictions of each disentangled decoder, and introduce a multiplex relation embedding module and an attentive fusion module to perform sufficient contextual information exchange among branches. Additionally, to reduce the model's computational cost, a position-sensitive axial attention is incorporated into the encoder, allowing our model to achieve a better accuracy-complexity trade-off. Extensive experiments are conducted on two public HOI benchmarks to demonstrate the effectiveness of our approach. The results indicate that our model outperforms other methods, achieving state-of-the-art performance.

引用

页数：17

共 53 条

[51] Polysemy Deciphering Network for Robust Human-Object Interaction Detection [J].

Zhong, Xubin ;

Ding, Changxing ;

Qu, Xian ;

Tao, Dacheng .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (06) :1910-1929

[52] Human-Object Interaction Detection via Disentangled Transformer [J].

Zhou, Desen ;

Liu, Zhichao ;

Wang, Jian ;

Wang, Leshan ;

Hu, Tao ;

Ding, Errui ;

Wang, Jingdong .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :19546-19555

[53] End-to-End Human Object Interaction Detection with HOI Transformer [J].

Zou, Cheng ;

Wang, Bohan ;

Hu, Yue ;

Liu, Junqi ;

Wu, Qian ;

Zhao, Yu ;

Li, Boxun ;

Zhang, Chenguang ;

Zhang, Chi ;

Wei, Yichen ;

Sun, Jian .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :11820-11829

← 1 2 3 4 5 6 →