Zero-Shot Human-Object Interaction Detection via Similarity Propagation

被引:4
|
作者
Zong, Daoming [1 ]
Sun, Shiliang [1 ,2 ,3 ]
机构
[1] East China Normal Univ, Sch Comp Sci & Technol, Shanghai 200062, Peoples R China
[2] East China Normal Univ, Key Lab Adv Theory & Applicat Stat & Data Sci, Minist Educ, Shanghai 200062, Peoples R China
[3] Shanghai Jiao Tong Univ, Dept Automat, Shanghai 200240, Peoples R China
基金
中国国家自然科学基金;
关键词
Human-object interaction (HOI) detection; object detection; zero-shot learning (ZSL);
D O I
10.1109/TNNLS.2023.3309104
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human-object interaction (HOI) detection involves identifying interactions represented as < human, action, object >, requiring the localization of human-object pairs and interaction classification within an image. This work focuses on the challenge of detecting HOIs with unseen objects using the prevalent Transformer architecture. Our empirical analysis reveals that the performance degradation of novel HOI instances primarily arises from misclassifying unseen objects as confusable seen objects. To address this issue, we propose a similarity propagation (SP) scheme that leverages cosine similarity distance to regulate the prediction margin between seen and unseen objects. In addition, we introduce pseudo-supervision for unseen objects based on class semantic similarities during training. Furthermore, we incorporate semantic-aware instance-level and interaction-level contrastive losses with Transformer to enhance intraclass compactness and interclass separability, resulting in improved visual representations. Extensive experiments on two challenging benchmarks, V-COCO and HICO-DET, demonstrate the effectiveness of our model, outperforming current state-of-the-art methods under various zero-shot settings.
引用
收藏
页码:17805 / 17816
页数:12
相关论文
共 50 条
  • [1] Towards zero-shot human-object interaction detection via vision-language integration
    Xue, Weiying
    Liu, Qi
    Wang, Yuxiao
    Wei, Zhenao
    Xing, Xiaofen
    Xu, Xiangmin
    NEURAL NETWORKS, 2025, 187
  • [2] ConsNet: Learning Consistency Graph for Zero-Shot Human-Object Interaction Detection
    Liu, Ye
    Yuan, Junsong
    Chen, Chang Wen
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4235 - 4243
  • [3] Zero-Shot Learning on Human-Object Interaction Recognition in video
    Maraghi, Vali Ollah
    Faez, Karim
    2019 5TH IRANIAN CONFERENCE ON SIGNAL PROCESSING AND INTELLIGENT SYSTEMS (ICSPIS 2019), 2019,
  • [4] Scaling Human-Object Interaction Recognition through Zero-Shot Learning
    Shen, Liyue
    Yeung, Serena
    Hoffman, Judy
    Mori, Greg
    Li Fei-Fei
    2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 1568 - 1576
  • [5] Scaling Human-Object Interaction Recognition in the Video through Zero-Shot Learning
    Maraghi, Vali Ollah
    Faez, Karim
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021
  • [6] Interaction Compass: Multi-Label Zero-Shot Learning of Human-Object Interactions via Spatial Relations
    Huynh, Dat
    Elhamifar, Ehsan
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 8452 - 8463
  • [7] Zero-Shot Object Detection
    Bansal, Ankan
    Sikka, Karan
    Sharma, Gaurav
    Chellappa, Rama
    Divakaran, Ajay
    COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 : 397 - 414
  • [8] Zero-Shot Object Detection With Attributes-Based Category Similarity
    Mao, Qiaomei
    Wang, Chong
    Yu, Shenghao
    Zheng, Ye
    Li, Yuqi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2020, 67 (05) : 921 - 925
  • [9] ZERO-SHOT HUMAN-OBJECT INTERACTION (HOI) CLASSIFICATION BY BRIDGING GENERATIVE AND CONTRASTIVE IMAGE-LANGUAGE MODELS
    Jin, Ying
    Chen, Yinpeng
    Wang, Jianfeng
    Wang, Lijuan
    Hwang, Jenq-Neng
    Liu, Zicheng
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1970 - 1974
  • [10] ZERO-SHOT OBJECT DETECTION WITH TRANSFORMERS
    Zheng, Ye
    Cui, Li
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 444 - 448