Global Semantic Descriptors for Zero-Shot Action Recognition

被引：2

作者：

Estevam, Valter ^{[1
,2
]}

Laroca, Rayson ^{[2
]}

Pedrini, Helio ^{[3
]}

Menotti, David ^{[2
]}

机构：

[1] Fed Inst Parana, BR-84500000 Irati, Brazil

[2] Univ Fed Parana, BR-81531970 Curitiba, Parana, Brazil

[3] Univ Estadual Campinas, BR-13083852 Campinas, Brazil

来源：

IEEE SIGNAL PROCESSING LETTERS | 2022年 / 29卷

关键词：

Semantics; Computational modeling; Encoding; Observers; Estimation; Training; Labeling; Zero-shot learning; sentence representation; video captioning; object recognition;

D O I：

10.1109/LSP.2022.3200605

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The success of zero-shot action recognition (ZSAR) methods is intrinsically related to the nature of semantic side information used to transfer knowledge, although this aspect has not been primarily investigated in the literature. This work introduces a new ZSAR method based on the relationships of actions-objects and actions-descriptive sentences. We demonstrate that representing all object classes using descriptive sentences generates an accurate object-action affinity estimation when a paraphrase estimation method is used as an embedder. We also show how to estimate probabilities over the set of action classes based only on a set of sentences without hard human labeling. In our method, the probabilities from these two global classifiers (i.e., which use features computed over the entire video) are combined, producing an efficient transfer knowledge model for action classification. Our results are state-of-the-art in the Kinetics-400 dataset and are competitive on UCF-101 under the ZSAR evaluation. Our code is available at https://github.com/valterlej/objsentzsar.

引用

页码：1843 / 1847

页数：5

共 33 条

[1]

Anil R., 2022, P IEEECVF C COMPUTER, P10925

[2]

[Anonymous], 1998, WORDNET ELECT LEXICA

[3] A union of deep learning and swarm-based optimization for 3D human action recognition [J].

Basak, Hritam ;

Kundu, Rohit ;

Singh, Pawan Kumar ;

Ijaz, Muhammad Fazal ;

Wozniak, Marcin ;

Sarkar, Ram .

SCIENTIFIC REPORTS, 2022, 12 (01)

[4] Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications [J].

Brattoli, Biagio ;

Tighe, Joseph ;

Zhdanov, Fedor ;

Perona, Pietro ;

Chalupka, Krzysztof .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :4612-4622

[5]

Bretti C., 2021, BRIT MACHINE VISION, P1

[6]

Carreira J, 2019, Arxiv, DOI arXiv:1907.06987

[7] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].

Carreira, Joao ;

Zisserman, Andrew .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733

[8] Elaborative Rehearsal for Zero-shot Action Recognition [J].

Chen, Shizhe ;

Huang, Dong .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :13618-13627

[9] Learning Spatiotemporal Features with 3D Convolutional Networks [J].

Du Tran ;

Bourdev, Lubomir ;

Fergus, Rob ;

Torresani, Lorenzo ;

Paluri, Manohar .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497

[10]

Estevam V., 2021, arXiv

← 1 2 3 4 →