MMAct: A Large-Scale Dataset for Cross Modal Human Action Understanding

被引：47

作者：

Kong, Quan ^{[1
]}

Wu, Ziming ^{[2
,4
]}

Deng, Ziwei ^{[1
]}

Klinkigt, Martin ^{[1
]}

Tong, Bin ^{[1
,3
,4
,5
]}

Murakami, Tomokazu ^{[1
]}

机构：

[1] Hitachi Ltd, R&D Grp, Tokyo, Japan

[2] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China

[3] Alibaba Grp, Hangzhou, Peoples R China

[4] Hitachi, Tokyo, Japan

[5] Alibaba, Hangzhou, Peoples R China

来源：

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) | 2019年

关键词：

RECOGNITION;

D O I：

10.1109/ICCV.2019.00875

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Unlike vision modalities, body-worn sensors or passive sensing can avoid the failure of action understanding in vision related challenges, e.g. occlusion and appearance variation. However, a standard large-scale dataset does not exist, in which different types of modalities across vision and sensors are integrated. To address the disadvantage of vision-based modalities and push towards multi/cross modal action understanding, this paper introduces a new large-scale dataset recorded from 20 distinct subjects with seven different types of modalities: RGB videos, keypoints, acceleration, gyroscope, orientation, Wi-Fi and pressure signal. The dataset consists of more than 36k video clips for 37 action classes covering a wide range of daily life activities such as desktop-related and check-in-based ones in four different distinct scenarios. On the basis of our dataset, we propose a novel multi modality distillation model with attention mechanism to realize an adaptive knowledge transfer from sensor-based modalities to vision-based modalities. The proposed model significantly improves performance of action recognition compared to models trained with only RGB information. The experimental results confirm the effectiveness of our model on cross-subject, -view, -scene and -session evaluation criteria. We believe that this new large-scale multimodal dataset will contribute the community of multimodal based action understanding.

引用

页码：8657 / 8666

页数：10

共 50 条

[1] A large-scale fMRI dataset for human action recognition
Zhou, Ming
Gong, Zhengxin
Dai, Yuxuan
Wen, Yushan
Liu, Youyi
Zhen, Zonglei
SCIENTIFIC DATA, 2023, 10 (01)
[2] A large-scale fMRI dataset for human action recognition
Ming Zhou
Zhengxin Gong
Yuxuan Dai
Yushan Wen
Youyi Liu
Zonglei Zhen
Scientific Data, 10
[3] The Jester Dataset: A Large-Scale Video Dataset of Human Gestures
Materzynska, Joanna
Berger, Guillaume
Bax, Ingo
Memisevic, Roland
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 2874 - 2882
[4] SkatingVerse: A large-scale benchmark for comprehensive evaluation on human action understanding
Gan, Ziliang
Jin, Lei
Cheng, Yi
Cheng, Yu
Teng, Yinglei
Li, Zun
Li, Yawen
Yang, Wenhan
Zhu, Zheng
Xing, Junliang
Zhao, Jian
IET COMPUTER VISION, 2024, 18 (07) : 888 - 906
[5] UAV Cross-Modal Image Registration: Large-Scale Dataset and Transformer-Based Approach
Xiao, Yun
Liu, Fei
Zhu, Yabin
Li, Chenglong
Wang, Futian
Tang, Jin
ADVANCES IN BRAIN INSPIRED COGNITIVE SYSTEMS, BICS 2023, 2024, 14374 : 166 - 176
[6] SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding
Bastani, Favyen
Wolters, Piper
Gupta, Ritwik
Ferdinando, Joe
Kembhavi, Aniruddha
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 16726 - +
[7] Large-Scale Synthetic Urban Dataset for Aerial Scene Understanding
Gao, Qian
Shen, Xukun
Niu, Wensheng
IEEE ACCESS, 2020, 8 (08): : 42131 - 42140
[8] Cross-Modal Object Tracking via Modality-Aware Fusion Network and a Large-Scale Dataset
Liu, Lei
Zhang, Mengya
Li, Cheng
Li, Chenglong
Tang, Jin
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 14
[9] CCMB: A Large-scale Chinese Cross-modal Benchmark
Xie, Chunyu
Cai, Heng
Li, Jincheng
Kong, Fanjing
Wu, Xiaoyu
Song, Jianfei
Morimitsu, Henrique
Yao, Lin
Wang, Dexin
Zhang, Xiangzheng
Leng, Dawei
Zhang, Baochang
Ji, Xiangyang
Deng, Yafeng
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4219 - 4227
[10] Large-Scale Supervised Hashing for Cross-Modal Retreival
Karbil, Loubna
Daoudi, Imane
2017 IEEE/ACS 14TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2017, : 803 - 808

← 1 2 3 4 5 →