HOI as Embeddings: Advancements of Model Representation Capability in Human-Object Interaction Detection

被引：0

作者：

Chen, Junwen ^{[1
]}

Wang, Yingcheng ^{[1
]}

Yanai, Keiji ^{[1
]}

机构：

[1] Univ Electrocommun, Dept Informat, Tokyo, Japan

来源：

2024 IEEE 7TH INTERNATIONAL CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL, MIPR 2024 | 2024年

关键词：

Human-Object Interaction; Transformer;

D O I：

10.1109/MIPR62202.2024.00025

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years, Human-Object Interaction Detection (HOID) has attracted increasing attention in the computer vision community and has been greatly advanced by the introduction of transformer-based models. However, the representation capability of the pre-trained object detection model is insufficient for capturing the complex interactions between humans and objects, which limits the performance of HOID methods. In this paper, we introduce three methods to progressively enhance the representation capability. (1) We propose QAHOI to take advantage of multi-scale feature maps with different spatial scales. (2) We propose PQNet to speed up training convergence with parallel queries. (3) We propose SOV-STG to combine the merits of QAHOI and PQNet and introduce the denoising learning strategy to further improve training convergence and performance. Our proposed method SOV-STG achieves state-of-the-art performance on the HICO-DET dataset with one-third of the training epochs compared to previous SOTA methods.

引用

页码：116 / 122

页数：7

共 42 条

[1]

[Anonymous], 2022, P 4 ACM INT C MULT A

[2]

Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13

[3] Learning to Detect Human-Object Interactions [J].

Chao, Yu-Wei ;

Liu, Yunfan ;

Liu, Xieyang ;

Zeng, Huayi ;

Deng, Jia .

2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, :381-389

[4]

Chen JW, 2024, Arxiv, DOI arXiv:2307.02291

[5] QAHOI: Query-Based Anchors for Human-Object Interaction Detection [J].

Chen, Junwen ;

Yanai, Keiji .

2023 18TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND APPLICATIONS, MVA, 2023,

[6] Reformulating HOI Detection as Adaptive Set Prediction [J].

Chen, Mingfei ;

Liao, Yue ;

Liu, Si ;

Chen, Zhiyuan ;

Wang, Fei ;

Qian, Chen .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :9000-9009

[7] Category-Aware Transformer Network for Better Human-Object Interaction Detection [J].

Dong, Leizhen ;

Li, Zhimin ;

Xu, Kunlun ;

Zhang, Zhijun ;

Yan, Luxin ;

Zhong, Sheng ;

Zou, Xu .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :19516-19525

[8]

Dosovitskiy A., 2021, ICLR

[9]

Gao C., 2018, ican: Instance-centric attention network for human-object interaction detection

[10]

Gao C, 2020, Arxiv, DOI arXiv:2008.11714

← 1 2 3 4 5 →