MAVD: The First Open Large-Scale Mandarin Audio-Visual Dataset with Depth Information

被引:2
|
作者
Wang, Jianrong [1 ]
Huo, Yuchen [2 ]
Liu, Li [3 ]
Xu, Tianyi [1 ]
Li, Qi [4 ]
Li, Sen [1 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin, Peoples R China
[2] Tianjin Univ, Tianjin Int Engn Inst, Tianjin, Peoples R China
[3] Hong Kong Univ Sci & Technol Guangzhou, Guangzhou, Peoples R China
[4] Tianjin Univ, Sch Elect & Informat Engn, Tianjin, Peoples R China
来源
INTERSPEECH 2023 | 2023年
基金
中国国家自然科学基金;
关键词
Audio-Visual Speech Recognition; Mandarin Audio-Visual Corpus; Azure Kinect; Depth Information; SPEECH; RECOGNITION; TECHNOLOGY;
D O I
10.21437/Interspeech.2023-823
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Audio-visual speech recognition (AVSR) gains increasing attention from researchers as an important part of human-computer interaction. However, the existing available Mandarin audio-visual datasets are limited and lack the depth information. To address this issue, this work establishes the MAVD, a new large-scale Mandarin multimodal corpus comprising 12,484 utterances spoken by 64 native Chinese speakers. To ensure the dataset covers diverse real-world scenarios, a pipeline for cleaning and filtering the raw text material has been developed to create a well-balanced reading material. In particular, the latest data acquisition device of Microsoft, Azure Kinect is used to capture depth information in addition to the traditional audio signals and RGB images during data acquisition. We also provide a baseline experiment, which could be used to evaluate the effectiveness of the dataset. The dataset and code will be released at https://github.com/SpringHuo/MAVD.
引用
收藏
页码:2113 / 2117
页数:5
相关论文
共 50 条
  • [21] EmoSet: A Large-scale Visual Emotion Dataset with Rich Attributes
    Yang, Jingyuan
    Huang, Qirui
    Ding, Tingting
    Lischinski, Dani
    Cohen-Or, Daniel
    Huang, Hui
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 20326 - 20337
  • [22] A large-scale fMRI dataset for the visual processing of naturalistic scenes
    Gong, Zhengxin
    Zhou, Ming
    Dai, Yuxuan
    Wen, Yushan
    Liu, Youyi
    Zhen, Zonglei
    SCIENTIFIC DATA, 2023, 10 (01)
  • [23] A Large-scale Dataset of (Open Source) License Text Variants
    Zacchiroli, Stefano
    2022 MINING SOFTWARE REPOSITORIES CONFERENCE (MSR 2022), 2022, : 757 - 761
  • [24] Large-Scale Room Impulse Response Dataset Compression With Neural Audio Codecs
    Mezza, Alessandro Ilic
    Bernardini, Alberto
    Antonacci, Fabio
    2024 IEEE 5TH INTERNATIONAL SYMPOSIUM ON THE INTERNET OF SOUNDS, IS2 2024, 2024, : 102 - 109
  • [25] Large Scale Audio-Visual Video Analytics Platform for Forensic Investigations of Terroristic Attacks
    Schindler, Alexander
    Boyer, Martin
    Lindley, Andrew
    Schreiber, David
    Philipp, Thomas
    MULTIMEDIA MODELING, MMM 2019, PT II, 2019, 11296 : 106 - 119
  • [26] Development of a large-scale medical visual question-answering dataset
    Zhang, Xiaoman
    Wu, Chaoyi
    Zhao, Ziheng
    Lin, Weixiong
    Zhang, Ya
    Wang, Yanfeng
    Xie, Weidi
    COMMUNICATIONS MEDICINE, 2024, 4 (01):
  • [27] RnR: Extraction of Visual Attributes from Large-Scale Fashion Dataset
    Lee, Sungjae
    Lee, Yeonji
    Kim, Junho
    Lee, Kyungyong
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 5043 - 5047
  • [28] Hierarchical Transformer for Visual Affordance Understanding using a Large-scale Dataset
    Shah, Syed Afaq Ali
    Khalifa, Zeyad
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 11371 - 11376
  • [29] Visual information system of large-scale underground caverns
    Yang, Qiang
    Zhou, Weiyuan
    Yang, Ruoqiong
    Yanshilixue Yu Gongcheng Xuebao/Chinese Journal of Rock Mechanics and Engineering, 2000, 19 (SUPPL.): : 1042 - 1047
  • [30] Deep monocular depth estimation leveraging a large-scale outdoor stereo dataset
    Cho, Jaehoon
    Min, Dongbo
    Kim, Youngjung
    Sohn, Kwanghoon
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 178