MetaVD: A Meta Video Dataset for enhancing human action recognition datasets

被引:6
作者
Yoshikawa, Yuya [1 ]
Shigeto, Yutaro [1 ]
Takeuchi, Akikazu [1 ]
机构
[1] Chiba Inst Technol, Software Technol & Artificial Intelligence Res La, Chiba, Japan
关键词
Human action recognition; Video datasets;
D O I
10.1016/j.cviu.2021.103276
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Numerous practical datasets have been developed to recognize human actions from videos. However, many of them were constructed by collecting videos within a limited domain; thus, a model trained using one of the existing datasets often fails to classify videos in a different domain accurately. A possible solution for this drawback is to enhance the domain of each action label, i.e., to import videos associated with a given action label from the other datasets, and then, to train a model using the enhanced dataset. To realize this solution, we constructed a meta video dataset from the existing datasets for human action recognition, referred to as MetaVD. MetaVD comprises six popular human action recognition datasets, which we integrated by annotating 568,015 relation labels in total. These relation labels reflect equality, similarity, and hierarchy between action labels of the original datasets. We further present simple yet effective dataset enhancement methods using MetaVD, which are useful for training models with higher generalization performance, as established by experiments on human action classification. As a further contribution of MetaVD, we show that its analysis can provide useful insight into the datasets.
引用
收藏
页数:14
相关论文
共 33 条
  • [1] [Anonymous], 2019, ARXIV PREPRINT ARXIV
  • [2] Bertasius G., 2021, ARXIV PREPRINT ARXIV
  • [3] Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698
  • [4] Carreira J., 2019, CoRR abs/1907.06987
  • [5] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
    Carreira, Joao
    Zisserman, Andrew
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
  • [6] Temporal Attentive Alignment for Large-Scale Video Domain Adaptation
    Chen, Min-Hung
    Kira, Zsolt
    AlRegib, Ghassan
    Yoo, Jaekwon
    Chen, Ruxin
    Zheng, Jian
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6330 - 6339
  • [7] SlowFast Networks for Video Recognition
    Feichtenhofer, Christoph
    Fan, Haoqi
    Malik, Jitendra
    He, Kaiming
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6201 - 6210
  • [8] Gansner ER, 2000, SOFTWARE PRACT EXPER, V30, P1203, DOI 10.1002/1097-024X(200009)30:11<1203::AID-SPE338>3.0.CO
  • [9] 2-N
  • [10] Exploring the Cross-Domain Action Recognition Problem by Deep Feature Learning and Cross-Domain Learning
    Gao, Zan
    Han, T. T.
    Zhu, Lei
    Zhang, Hua
    Wang, Yinglong
    [J]. IEEE ACCESS, 2018, 6 : 68989 - 69008