MetaVD: A Meta Video Dataset for enhancing human action recognition datasets

被引：6

作者：

Yoshikawa, Yuya ^{[1
]}

Shigeto, Yutaro ^{[1
]}

Takeuchi, Akikazu ^{[1
]}

机构：

[1] Chiba Inst Technol, Software Technol & Artificial Intelligence Res La, Chiba, Japan

来源：

COMPUTER VISION AND IMAGE UNDERSTANDING | 2021年 / 212卷

关键词：

Human action recognition; Video datasets;

D O I：

10.1016/j.cviu.2021.103276

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Numerous practical datasets have been developed to recognize human actions from videos. However, many of them were constructed by collecting videos within a limited domain; thus, a model trained using one of the existing datasets often fails to classify videos in a different domain accurately. A possible solution for this drawback is to enhance the domain of each action label, i.e., to import videos associated with a given action label from the other datasets, and then, to train a model using the enhanced dataset. To realize this solution, we constructed a meta video dataset from the existing datasets for human action recognition, referred to as MetaVD. MetaVD comprises six popular human action recognition datasets, which we integrated by annotating 568,015 relation labels in total. These relation labels reflect equality, similarity, and hierarchy between action labels of the original datasets. We further present simple yet effective dataset enhancement methods using MetaVD, which are useful for training models with higher generalization performance, as established by experiments on human action classification. As a further contribution of MetaVD, we show that its analysis can provide useful insight into the datasets.

引用

页数：14

共 33 条

[1] [Anonymous], 2019, ARXIV PREPRINT ARXIV
[2] Bertasius G., 2021, ARXIV PREPRINT ARXIV
[3] Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698
[4] Carreira J., 2019, CoRR abs/1907.06987
[5] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Carreira, Joao
Zisserman, Andrew
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
[6] Temporal Attentive Alignment for Large-Scale Video Domain Adaptation
Chen, Min-Hung
Kira, Zsolt
AlRegib, Ghassan
Yoo, Jaekwon
Chen, Ruxin
Zheng, Jian
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6330 - 6339
[7] SlowFast Networks for Video Recognition
Feichtenhofer, Christoph
Fan, Haoqi
Malik, Jitendra
He, Kaiming
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6201 - 6210
[8] Gansner ER, 2000, SOFTWARE PRACT EXPER, V30, P1203, DOI 10.1002/1097-024X(200009)30:11<1203::AID-SPE338>3.0.CO
[9] 2-N
[10] Exploring the Cross-Domain Action Recognition Problem by Deep Feature Learning and Cross-Domain Learning
Gao, Zan
Han, T. T.
Zhu, Lei
Zhang, Hua
Wang, Yinglong
[J]. IEEE ACCESS, 2018, 6 : 68989 - 69008

← 1 2 3 4 →