ODSQA: OPEN-DOMAIN SPOKEN QUESTION ANSWERING DATASET

被引:0
作者
Lee, Chia-Hsuan [1 ]
Wang, Shang-Ming [1 ]
Chang, Huan-Cheng [1 ]
Lee, Hung-Yi [1 ]
机构
[1] Natl Taiwan Univ, Coll Elect Engn & Comp Sci, Taipei, Taiwan
来源
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018) | 2018年
关键词
spoken question answering; TALKING;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reading comprehension by machine has been widely studied, but machine comprehension of spoken content is still a less investigated problem. In this paper, we release Open-Domain Spoken Question Answering Dataset (ODSQA) with more than three thousand questions. To the best of our knowledge, this is the largest real SQA dataset. On this dataset, we found that ASR errors have catastrophic impact on SQA. To mitigate the effect of ASR errors, subword units are involved, which brings consistent improvements over all the models. We further found that data augmentation on text- based QA training examples can improve SQA.
引用
收藏
页码:949 / 956
页数:8
相关论文
empty
未找到相关数据