Improving zero-shot action recognition using human instruction with text description

被引：0

作者：

Nan Wu

Hiroshi Kera

Kazuhiko Kawamoto

机构：

[1] Chiba University,Graduate School of Science and Engineering

[2] Chiba University,Graduate School of Engineering

来源：

Applied Intelligence | 2023年 / 53卷

关键词：

Zero-shot learning; Zero-shot action recognition; Visual question answering;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Zero-shot action recognition, which recognizes actions in videos without having received any training examples, is gaining wide attention considering it can save labor costs and training time. Nevertheless, the performance of zero-shot learning is still unsatisfactory, which limits its practical application. To solve this problem, this study proposes a framework to improve zero-shot action recognition using human instructions with text descriptions. The proposed framework manually describes video contents, which incurs some labor costs; in many situations, the labor costs are worth it. We manually annotate text features for each action, which can be a word, phrase, or sentence. Then by computing the matching degrees between the video and all text features, we can predict the class of the video. Furthermore, the proposed model can also be combined with other models to improve its accuracy. In addition, our model can be continuously optimized to improve the accuracy by repeating human instructions. The results with UCF101 and HMDB51 showed that our model achieved the best accuracy and improved the accuracies of other models.

引用

页码：24142 / 24156

页数：14

共 48 条

[1]

Lampert CH(2013)Attribute-based classification for zero-shot visual object categorization IEEE Trans Pattern Anal Mach Intell 36 453-465

[2]

Nickisch H(2013)Dense trajectories and motion boundary descriptors for action recognition Int J Comput Vis 103 60-79

[3]

Harmeling S(2019)A motion-aware convlstm network for action recognition Appl Intell 49 2515-2521

[4]

Wang H(2021)A combined multiple action recognition and summarization for surveillance video sequences Appl Intell 51 690-712

[5]

Kläser A(2020)A multimodal approach for human activity recognition based on skeleton and rgb data Pattern Recogn Lett 131 293-299

[6]

Schmid C(2021)Zero-shot action recognition with three-stream graph convolutional networks Sens 21 3793-231

[7]

Liu C-L(2012)3d convolutional neural networks for human action recognition IEEE Trans Pattern Anal Mach Intell 35 221-1780

[8]

Majd M(1997)Long short-term memory Neural Comput 9 1735-824

[9]

Safabakhsh R(2019)Generalized zero-shot learning for action recognition with web-scale video data World Wide Web 22 807-25577

[10]

Elharrouss O(2021)Reformulating zero-shot action recognition for multi-label actions Adv Neural Inf Proces Syst 34 25566-300

← 1 2 3 4 5 →