Pairwise Negative Sample Mining for Human-Object Interaction Detection

被引：0

作者：

Jia, Weizhe ^{[1
]}

Ma, Shiwei ^{[1
]}

机构：

[1] Shanghai Univ, Sch Mech Engn & Automat, Shanghai 200444, Peoples R China

来源：

PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VII | 2024年 / 14431卷

关键词：

Human-object interaction; Transformer; Sample mining;

D O I：

10.1007/978-981-99-8540-1_34

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years, Human-Object Interaction (HOI) detection has been riding the wave of the development of object detectors. Typically, one-stage methods exploit them by instantiating HOI and detecting them in a coarse end-to-end learning scheme. With the invention detection transformer (DETR), more studies followed and addressed HOI detection in this novel set-prediction manner and achieved decent performance. However, the scarcity of positive samples in the dataset, especially among object-dense images, hinders learning and leads to lower detection quality. To alleviate this issue, we propose a sample mining technique that utilizes non-interactive human-object pairs to generate negative samples containing HOI features, enriching the sample sets to help the learning of queries. We also introduce interactivity priors, namely filtering out insignificant background objects to inhibit their disturbance. Our technique can be seamlessly integrated into an end-to-end training scheme. Additionally, we propose an unorthodox two-stage transformer-based method that separates pairwise detection and interaction inference to be handled by two cascade decoders, further exploiting this technique. Experimental results on mainstream datasets demonstrate that our approach achieves new state-of-the-art performance, surpassing both one-stage and traditional two-stage methods. Our study reveals the potential to convert between the two method types by adjusting data utilization.

引用

页码：425 / 437

页数：13

共 20 条

[1] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[2] Learning to Detect Human-Object Interactions [J].

Chao, Yu-Wei ;

Liu, Yunfan ;

Liu, Xieyang ;

Zeng, Huayi ;

Deng, Jia .

2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, :381-389

[3] DRG: Dual Relation Graph for Human-Object Interaction Detection [J].

Gao, Chen ;

Xu, Jiarui ;

Zou, Yuliang ;

Huang, Jia-Bin .

COMPUTER VISION - ECCV 2020, PT XII, 2020, 12357 :696-712

[4] Detecting and Recognizing Human-Object Interactions [J].

Gkioxari, Georgia ;

Girshick, Ross ;

Dollar, Piotr ;

He, Kaiming .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :8359-8367

[5]

Gupta S, 2015, Arxiv, DOI arXiv:1505.04474

[6] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[7] HOTR: End-to-End Human-Object Interaction Detection with Transformers [J].

Kim, Bumsoo ;

Lee, Junhyun ;

Kang, Jaewoo ;

Kim, Eun-Sol ;

Kim, Hyunwoo J. .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :74-83

[8] UnionDet: Union-Level Detector Towards Real-Time Human-Object Interaction Detection [J].

Kim, Bumsoo ;

Choi, Taeho ;

Kang, Jaewoo ;

Kim, Hyunwoo J. .

COMPUTER VISION - ECCV 2020, PT XV, 2020, 12360 :498-514

[9] The Hungarian Method for the assignment problem [J].

Kuhn, HW .

NAVAL RESEARCH LOGISTICS, 2005, 52 (01) :7-21

[10] Focal Loss for Dense Object Detection [J].

Lin, Tsung-Yi ;

Goyal, Priya ;

Girshick, Ross ;

He, Kaiming ;

Dollar, Piotr .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :2999-3007

← 1 2 →