Learning Human-Object Interaction Detection via Deformable Transformer

被引：0

作者：

Cai, Shuang ^{[1
]}

Ma, Shiwei ^{[1
]}

Gu, Dongzhou ^{[1
]}

机构：

[1] Shanghai Univ, Sch Mechatron Engn & Automat, Shanghai, Peoples R China

来源：

2021 INTERNATIONAL CONFERENCE ON IMAGE, VIDEO PROCESSING, AND ARTIFICIAL INTELLIGENCE | 2021年 / 12076卷

关键词：

Human-object interaction; deformable transformer; attention mechanism; contextual information;

D O I：

10.1117/12.2606873

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The goal of human-object interaction (HOT) detection is to localize both the human and object in a picture and recognize the interactions between them. HOIs are always scattering in the image. The traditional methods based on CNNs are unable to aggregate the information scattered in the image. Many new methods utilizing the contextual features cropped from the outputs of the CNNs, which sometimes are not effective enough. To overcome the challenge, we utilize the deformable transformer to aggregate the whole features output form the CNNs. The attention mechanism and query-based predictions are the keys. In view of the success of the methods based on graph neural networks, the attention mechanism is proved to be effective to aggregate the contextual information image-wide. The queries can extract the features of each human-object pair without mixing up the features of other instances. The deformable transformer can extract effective embeddings and the prediction heads can be fairly simple. Experimental results show that the proposed method is effective in HOT detection.

引用

页数：6

共 23 条

[1]

Carion Nicolas, 2020, EUROPEAN C COMPUTER

[2] Learning to Detect Human-Object Interactions [J].

Chao, Yu-Wei ;

Liu, Yunfan ;

Liu, Xieyang ;

Zeng, Huayi ;

Deng, Jia .

2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, :381-389

[3]

Fang H. S, 2020, P AAAI, P1314

[4]

Gao C., 2018, BMVC, DOI DOI 10.1109/RADAR.2018.8557284

[5]

Gao C., 2020, P ECCV, P179

[6] Detecting and Recognizing Human-Object Interactions [J].

Gkioxari, Georgia ;

Girshick, Ross ;

Dollar, Piotr ;

He, Kaiming .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :8359-8367

[7]

Gupta S., 2015, CoRR abs/1505.04474

[8] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[9]

Hou Z, 2020, P ECCV, P210

[10]

Kim B, 2020, P ECCV, P112

← 1 2 3 →