Multi-Modal and Multi-Scale Oral Diadochokinesis Analysis using Deep Learning

被引:0
|
作者
Wang, Yang Yang [1 ]
Gaol, Ke [2 ]
Hamad, Ali [1 ]
McCarthy, Brianna [2 ]
Kloepper, Ashley M. [2 ]
Lever, Teresa E. [2 ]
Bunyak, Filiz [1 ]
机构
[1] Univ Missouri, Dept Elect Engn & Comp Sci, Columbia, MO 65211 USA
[2] Univ Missouri, Dept Otolaryngol Head & Neck Surg, Columbia, MO USA
来源
2021 IEEE APPLIED IMAGERY PATTERN RECOGNITION WORKSHOP (AIPR) | 2021年
关键词
Oral diadochokinesis; syllable detection; mouth/jaw motion; deep learning; HUMAN AGE ESTIMATION; PARKINSONS-DISEASE; SPEECH; WORD;
D O I
10.1109/AIPR52630.2021.9762216
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Various neurological disorders such as Parkinson's disease (PD), stroke, amyotrophic lateral sclerosis (ALS), etc. cause oromotor dysfunctions resulting in significant speech and swallowing impairments. Assessment and monitoring of speech disorders offer effective and non-invasive opportunities for differential diagnosis and treatment monitoring of neurological disorders. Oral diadochokinesis (oral-DDK) is a widely used test conducted by speech-language pathologists (SLPs) to assess speech impairments. Unfortunately, analysis of the oral-DDK tests relies on perceptual judgments by SLPs and are often subjective and qualitative, thus limiting their clinical value. In this paper, we propose a multi-modal oral-DDK test analysis system involving automated processing of complementary 1D audio and 2D video signals of both speech and swallowing function. The system aims to automatically generate objective and quantitative measures from the oral-DDK tests to aid early diagnosis and treatment monitoring of neurological disorders. The audio signal analysis component of the proposed system involves a novel multi-scale deep learning network. The video signal analysis component involves tracking mouth and jaw motion during speech tests using our visual landmark tracking software. The proposed system has been evaluated on speech files corresponding to 9 different DDK speech syllables. The experimental results demonstrate promising audio syllable detection performance with an average of 1.6% count error across different types of oral-DDK speech tasks. Moreover, our preliminary results demonstrate added value of combined audio and video signal analysis.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Multi-Modal and Multi-Scale Oral Diadochokinesis Analysis using Deep Learning
    Department of Electrical Engineering and Computer Science, University of Missouri, Columbia
    MO, United States
    不详
    MO, United States
    Proc. Appl. Imagery Pattern. Recogn. Workshop, 2021,
  • [2] Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation
    Niu, Yulei
    Lu, Zhiwu
    Wen, Ji-Rong
    Xiang, Tao
    Chang, Shih-Fu
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (04) : 1720 - 1731
  • [3] Deep Multi-Modal Metric Learning with Multi-Scale Correlation for Image-Text Retrieval
    Hua, Yan
    Yang, Yingyun
    Du, Jianhe
    ELECTRONICS, 2020, 9 (03)
  • [4] Multi-scale and multi-modal contrastive learning network for biomedical time series
    Guo, Hongbo
    Xu, Xinzi
    Wu, Hao
    Liu, Bin
    Xia, Jiahui
    Cheng, Yibang
    Guo, Qianhui
    Chen, Yi
    Xu, Tingyan
    Wang, Jiguang
    Wang, Guoxing
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 106
  • [5] Multi-modal and multi-scale photo collection summarization
    Shen, Xu
    Tian, Xinmei
    MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (05) : 2527 - 2541
  • [6] Multi-modal and multi-scale photo collection summarization
    Xu Shen
    Xinmei Tian
    Multimedia Tools and Applications, 2016, 75 : 2527 - 2541
  • [7] Multi-scale, multi-modal neural modeling and simulation
    Ishii, Shin
    Diesmann, Markus
    Doya, Kenji
    NEURAL NETWORKS, 2011, 24 (09) : 917 - 917
  • [8] Multi-modal and multi-scale retinal imaging with angiography
    Shirazi, Muhammad Faizan
    Andilla, Jordi
    Cunquero, Marina
    Lefaudeux, Nicolas
    De Jesus, Danilo Andrade
    Brea, Luisa Sanchez
    Klein, Stefan
    van Walsum, Theo
    Grieve, Kate
    Paques, Michel
    Torm, Marie Elise Wistrup
    Larsen, Michael
    Loza-Alvarez, Pablo
    Levecq, Xavier
    Chateau, Nicolas
    Pircher, Michael
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2021, 62 (08)
  • [9] Robust Multi-Scale Multi-modal Image Registration
    Holtzman-Gazit, Michal
    Yavneh, Irad
    SIGNAL PROCESSING, SENSOR FUSION, AND TARGET RECOGNITION XIX, 2010, 7697
  • [10] Deep multi-scale and multi-modal fusion for 3D object detection
    Guo, Rui
    Li, Deng
    Han, Yahong
    PATTERN RECOGNITION LETTERS, 2021, 151 : 236 - 242