MMAct: A Large-Scale Dataset for Cross Modal Human Action Understanding

被引:47
|
作者
Kong, Quan [1 ]
Wu, Ziming [2 ,4 ]
Deng, Ziwei [1 ]
Klinkigt, Martin [1 ]
Tong, Bin [1 ,3 ,4 ,5 ]
Murakami, Tomokazu [1 ]
机构
[1] Hitachi Ltd, R&D Grp, Tokyo, Japan
[2] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
[3] Alibaba Grp, Hangzhou, Peoples R China
[4] Hitachi, Tokyo, Japan
[5] Alibaba, Hangzhou, Peoples R China
来源
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) | 2019年
关键词
RECOGNITION;
D O I
10.1109/ICCV.2019.00875
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Unlike vision modalities, body-worn sensors or passive sensing can avoid the failure of action understanding in vision related challenges, e.g. occlusion and appearance variation. However, a standard large-scale dataset does not exist, in which different types of modalities across vision and sensors are integrated. To address the disadvantage of vision-based modalities and push towards multi/cross modal action understanding, this paper introduces a new large-scale dataset recorded from 20 distinct subjects with seven different types of modalities: RGB videos, keypoints, acceleration, gyroscope, orientation, Wi-Fi and pressure signal. The dataset consists of more than 36k video clips for 37 action classes covering a wide range of daily life activities such as desktop-related and check-in-based ones in four different distinct scenarios. On the basis of our dataset, we propose a novel multi modality distillation model with attention mechanism to realize an adaptive knowledge transfer from sensor-based modalities to vision-based modalities. The proposed model significantly improves performance of action recognition compared to models trained with only RGB information. The experimental results confirm the effectiveness of our model on cross-subject, -view, -scene and -session evaluation criteria. We believe that this new large-scale multimodal dataset will contribute the community of multimodal based action understanding.
引用
收藏
页码:8657 / 8666
页数:10
相关论文
共 50 条
  • [1] A large-scale fMRI dataset for human action recognition
    Zhou, Ming
    Gong, Zhengxin
    Dai, Yuxuan
    Wen, Yushan
    Liu, Youyi
    Zhen, Zonglei
    SCIENTIFIC DATA, 2023, 10 (01)
  • [2] A large-scale fMRI dataset for human action recognition
    Ming Zhou
    Zhengxin Gong
    Yuxuan Dai
    Yushan Wen
    Youyi Liu
    Zonglei Zhen
    Scientific Data, 10
  • [3] The Jester Dataset: A Large-Scale Video Dataset of Human Gestures
    Materzynska, Joanna
    Berger, Guillaume
    Bax, Ingo
    Memisevic, Roland
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 2874 - 2882
  • [4] SkatingVerse: A large-scale benchmark for comprehensive evaluation on human action understanding
    Gan, Ziliang
    Jin, Lei
    Cheng, Yi
    Cheng, Yu
    Teng, Yinglei
    Li, Zun
    Li, Yawen
    Yang, Wenhan
    Zhu, Zheng
    Xing, Junliang
    Zhao, Jian
    IET COMPUTER VISION, 2024, 18 (07) : 888 - 906
  • [5] UAV Cross-Modal Image Registration: Large-Scale Dataset and Transformer-Based Approach
    Xiao, Yun
    Liu, Fei
    Zhu, Yabin
    Li, Chenglong
    Wang, Futian
    Tang, Jin
    ADVANCES IN BRAIN INSPIRED COGNITIVE SYSTEMS, BICS 2023, 2024, 14374 : 166 - 176
  • [6] SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding
    Bastani, Favyen
    Wolters, Piper
    Gupta, Ritwik
    Ferdinando, Joe
    Kembhavi, Aniruddha
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 16726 - +
  • [7] Large-Scale Synthetic Urban Dataset for Aerial Scene Understanding
    Gao, Qian
    Shen, Xukun
    Niu, Wensheng
    IEEE ACCESS, 2020, 8 (08): : 42131 - 42140
  • [8] Cross-Modal Object Tracking via Modality-Aware Fusion Network and a Large-Scale Dataset
    Liu, Lei
    Zhang, Mengya
    Li, Cheng
    Li, Chenglong
    Tang, Jin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 14
  • [9] CCMB: A Large-scale Chinese Cross-modal Benchmark
    Xie, Chunyu
    Cai, Heng
    Li, Jincheng
    Kong, Fanjing
    Wu, Xiaoyu
    Song, Jianfei
    Morimitsu, Henrique
    Yao, Lin
    Wang, Dexin
    Zhang, Xiangzheng
    Leng, Dawei
    Zhang, Baochang
    Ji, Xiangyang
    Deng, Yafeng
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4219 - 4227
  • [10] Large-Scale Supervised Hashing for Cross-Modal Retreival
    Karbil, Loubna
    Daoudi, Imane
    2017 IEEE/ACS 14TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2017, : 803 - 808