HOTR: End-to-End Human-Object Interaction Detection with Transformers

被引:164
|
作者
Kim, Bumsoo [1 ,2 ]
Lee, Junhyun [2 ]
Kang, Jaewoo [2 ]
Kim, Eun-Sol [1 ]
Kim, Hyunwoo J. [2 ]
机构
[1] Kakao Brain, Seongnam, South Korea
[2] Korea Univ, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
D O I
10.1109/CVPR46437.2021.00014
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human-Object Interaction (HOI) detection is a task of identifying "a set of interactions" in an image, which involves the i) localization of the subject (i.e., humans) and target (i.e., objects) of interaction, and ii) the classification of the interaction labels. Most existing methods have indirectly addressed this task by detecting human and object instances and individually inferring every pair of the detected instances. In this paper, we present a novel framework, referred by HOTR, which directly predicts a set of < human, object, interaction > triplets from an image based on a transformer encoder-decoder architecture. Through the set prediction, our method effectively exploits the inherent semantic relationships in an image and does not require time-consuming post-processing which is the main bottleneck of existing methods. Our proposed algorithm achieves the state-of-the-art performance in two HOI detection benchmarks with an inference time under 1 ms after object detection.
引用
收藏
页码:74 / 83
页数:10
相关论文
共 50 条
  • [31] Exploring End-to-End object detection with transformers versus YOLOv8 for enhanced citrus fruit detection within trees
    Jrondi, Zineb
    Moussaid, Abdellatif
    Hadi, Moulay Youssef
    SYSTEMS AND SOFT COMPUTING, 2024, 6
  • [32] An Improved Human-Object Interaction Detection Network
    Gao, Song
    Wang, Hongyu
    Song, Jilai
    Xu, Fang
    Zou, Fengshan
    PROCEEDINGS OF 2019 IEEE 13TH INTERNATIONAL CONFERENCE ON ANTI-COUNTERFEITING, SECURITY, AND IDENTIFICATION (IEEE-ASID'2019), 2019, : 192 - 196
  • [33] End-to-End Object Detection with Enhanced Positive Sample Filter
    Song, Xiaolin
    Chen, Binghui
    Li, Pengyu
    Wang, Biao
    Zhang, Honggang
    APPLIED SCIENCES-BASEL, 2023, 13 (03):
  • [34] Dynamic DETR: End-to-End Object Detection with Dynamic Attention
    Dai, Xiyang
    Chen, Yinpeng
    Yang, Jianwei
    Zhang, Pengchuan
    Yuan, Lu
    Zhang, Lei
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 2968 - 2977
  • [35] End-To-End High-Quality Transformer Object Detection Model Applied to Human Head Detection
    Zhou, Zhen
    Li, Rongchun
    Qiao, Peng
    Jiang, Jingfei
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT XII, 2025, 15042 : 404 - 417
  • [36] Distance Matters in Human-Object Interaction Detection
    Wang, Guangzhi
    Guo, Yangyang
    Wong, Yongkang
    Kankanhalli, Mohan
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4546 - 4554
  • [37] Human-object interaction detection with missing objects
    Kogashi, Kaen
    Wu, Yang
    Nobuhara, Shohei
    Nishino, Ko
    IMAGE AND VISION COMPUTING, 2021, 113
  • [38] Agglomerative Transformer for Human-Object Interaction Detection
    Tu, Danyang
    Sun, Wei
    Zhai, Guangtao
    Shen, Wei
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21557 - 21567
  • [39] TransVG: End-to-End Visual Grounding with Transformers
    Deng, Jiajun
    Yang, Zhengyuan
    Chen, Tianlang
    Zhou, Wengang
    Li, Houqiang
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1749 - 1759
  • [40] End-to-end Lane Shape Prediction with Transformers
    Liu, Ruijin
    Yuan, Zejian
    Liu, Tie
    Xiong, Zhiliang
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 3693 - 3701