Detecting Mismatch Between Speech and Transcription Using Cross-Modal Attention

被引:0
作者
Huang, Qiang [1 ]
Hain, Thomas [1 ]
机构
[1] Univ Sheffield, Dept Comp Sci, Sheffield, S Yorkshire, England
来源
INTERSPEECH 2019 | 2019年
关键词
mismatch detection; deep learning; attention;
D O I
10.21437/Interspeech.2019-2125
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
In this paper, we propose to detect mismatches between speech and transcriptions using deep neural networks. Although it is generally assumed there are no mismatches in some speech related applications, it is hard to avoid the errors due to one reason or another. Moreover, the use of mismatched data probably leads to performance reduction when training a model. In our work, instead of detecting the errors by computing the distance between manual transcriptions and text strings obtained using a speech recogniser, we view mismatch detection as a classification task and merge speech and transcription features using deep neural networks. To enhance detection ability, we use cross-modal attention mechanism in our approach by learning the relevance between the features obtained from the two modalities. To evaluate the effectiveness of our approach, we test it on Factored WSJCAM0 by randomly setting three kinds of mismatch, word deletion, insertion or substitution. To test its robustness, we train our models using a small number of samples and detect mismatch with different number of words being removed, inserted, and substituted. In our experiments, the results show the use of our approach for mismatch detection is close to 80% on insertion and deletion and outperforms the baseline.
引用
收藏
页码:584 / 588
页数:5
相关论文
共 50 条
  • [31] Behold the voice of wrath: Cross-modal modulation of visual attention by anger prosody
    Brosch, Tobias
    Grandjean, Didier
    Sander, David
    Scherer, Klaus R.
    COGNITION, 2008, 106 (03) : 1497 - 1503
  • [32] ARIF: An Adaptive Attention-Based Cross-Modal Representation Integration Framework
    Liu, Chengzhi
    Luo, Zihong
    Bi, Yifei
    Huang, Zile
    Shu, Dong
    Hou, Jiheng
    Wang, Hongchen
    Liang, Kaiyu
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT VI, 2024, 15021 : 3 - 18
  • [33] Detect, Reject, Focus: The Role of Satiation and Odor Relevance in Cross-Modal Attention
    Schreiber, Timothy
    White, Theresa L.
    CHEMOSENSORY PERCEPTION, 2013, 6 (04) : 170 - 178
  • [34] No difference in cross-modal attention or sensory discrimination thresholds in autism and matched controls
    Haigh, Sarah M.
    Heeger, David J.
    Heller, Laurie M.
    Gupta, Akshat
    Dinstein, Ilan
    Minshew, Nancy J.
    Behrmann, Marlene
    VISION RESEARCH, 2016, 121 : 85 - 94
  • [35] ERP evidence of early cross-modal links between auditory selective attention and visuo-spatial memory
    Bomba, Marie D.
    Singhal, Anthony
    BRAIN AND COGNITION, 2010, 74 (03) : 273 - 280
  • [36] Management of attentional resources in within-modal and cross-modal divided attention tasks: An fMRI study
    Vohn, Rene
    Fimm, Bruno
    Weber, Jochen
    Schnitker, Ralph
    Thron, Armin
    Spijkers, Will
    Willmes, Klaus
    Sturm, Walter
    HUMAN BRAIN MAPPING, 2007, 28 (12) : 1267 - 1275
  • [37] CLASSIFICATION OF BREAST LESIONS USING CROSS-MODAL DEEP LEARNING
    Hadad, Omer
    Bakalo, Ran
    Ben-Ari, Rami
    Hashoul, Sharbell
    Amit, Guy
    2017 IEEE 14TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2017), 2017, : 109 - 112
  • [38] SMAN: Stacked Multimodal Attention Network for Cross-Modal Image-Text Retrieval
    Ji, Zhong
    Wang, Haoran
    Han, Jungong
    Pang, Yanwei
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (02) : 1086 - 1097
  • [39] BCSwinReg: A cross-modal attention network for CBCT-to-CT multimodal image registration
    Zhang, Jieming
    Qing, Chang
    Li, Yu
    Wang, Yaqi
    COMPUTERS IN BIOLOGY AND MEDICINE, 2024, 171
  • [40] Adaptive Graph Attention Hashing for Unsupervised Cross-Modal Retrieval via Multimodal Transformers
    Li, Yewen
    Ge, Mingyuan
    Ji, Yucheng
    Li, Mingyong
    WEB AND BIG DATA, PT III, APWEB-WAIM 2023, 2024, 14333 : 1 - 15