Multi-Modal and Multi-Scale Oral Diadochokinesis Analysis using Deep Learning

被引：0

作者：

Wang, Yang Yang ^{[1
]}

Gaol, Ke ^{[2
]}

Hamad, Ali ^{[1
]}

McCarthy, Brianna ^{[2
]}

Kloepper, Ashley M. ^{[2
]}

Lever, Teresa E. ^{[2
]}

Bunyak, Filiz ^{[1
]}

机构：

[1] Univ Missouri, Dept Elect Engn & Comp Sci, Columbia, MO 65211 USA

[2] Univ Missouri, Dept Otolaryngol Head & Neck Surg, Columbia, MO USA

来源：

2021 IEEE APPLIED IMAGERY PATTERN RECOGNITION WORKSHOP (AIPR) | 2021年

关键词：

Oral diadochokinesis; syllable detection; mouth/jaw motion; deep learning; HUMAN AGE ESTIMATION; PARKINSONS-DISEASE; SPEECH; WORD;

D O I：

10.1109/AIPR52630.2021.9762216

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Various neurological disorders such as Parkinson's disease (PD), stroke, amyotrophic lateral sclerosis (ALS), etc. cause oromotor dysfunctions resulting in significant speech and swallowing impairments. Assessment and monitoring of speech disorders offer effective and non-invasive opportunities for differential diagnosis and treatment monitoring of neurological disorders. Oral diadochokinesis (oral-DDK) is a widely used test conducted by speech-language pathologists (SLPs) to assess speech impairments. Unfortunately, analysis of the oral-DDK tests relies on perceptual judgments by SLPs and are often subjective and qualitative, thus limiting their clinical value. In this paper, we propose a multi-modal oral-DDK test analysis system involving automated processing of complementary 1D audio and 2D video signals of both speech and swallowing function. The system aims to automatically generate objective and quantitative measures from the oral-DDK tests to aid early diagnosis and treatment monitoring of neurological disorders. The audio signal analysis component of the proposed system involves a novel multi-scale deep learning network. The video signal analysis component involves tracking mouth and jaw motion during speech tests using our visual landmark tracking software. The proposed system has been evaluated on speech files corresponding to 9 different DDK speech syllables. The experimental results demonstrate promising audio syllable detection performance with an average of 1.6% count error across different types of oral-DDK speech tasks. Moreover, our preliminary results demonstrate added value of combined audio and video signal analysis.

引用

页数：6

共 50 条

[1] Multi-Modal and Multi-Scale Oral Diadochokinesis Analysis using Deep Learning
Department of Electrical Engineering and Computer Science, University of Missouri, Columbia
MO, United States
不详
MO, United States
Proc. Appl. Imagery Pattern. Recogn. Workshop, 2021,
[2] Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation
Niu, Yulei
Lu, Zhiwu
Wen, Ji-Rong
Xiang, Tao
Chang, Shih-Fu
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (04) : 1720 - 1731
[3] Deep Multi-Modal Metric Learning with Multi-Scale Correlation for Image-Text Retrieval
Hua, Yan
Yang, Yingyun
Du, Jianhe
ELECTRONICS, 2020, 9 (03)
[4] Multi-scale and multi-modal contrastive learning network for biomedical time series
Guo, Hongbo
Xu, Xinzi
Wu, Hao
Liu, Bin
Xia, Jiahui
Cheng, Yibang
Guo, Qianhui
Chen, Yi
Xu, Tingyan
Wang, Jiguang
Wang, Guoxing
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 106
[5] Multi-modal and multi-scale photo collection summarization
Shen, Xu
Tian, Xinmei
MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (05) : 2527 - 2541
[6] Multi-modal and multi-scale photo collection summarization
Xu Shen
Xinmei Tian
Multimedia Tools and Applications, 2016, 75 : 2527 - 2541
[7] Multi-scale, multi-modal neural modeling and simulation
Ishii, Shin
Diesmann, Markus
Doya, Kenji
NEURAL NETWORKS, 2011, 24 (09) : 917 - 917
[8] Multi-modal and multi-scale retinal imaging with angiography
Shirazi, Muhammad Faizan
Andilla, Jordi
Cunquero, Marina
Lefaudeux, Nicolas
De Jesus, Danilo Andrade
Brea, Luisa Sanchez
Klein, Stefan
van Walsum, Theo
Grieve, Kate
Paques, Michel
Torm, Marie Elise Wistrup
Larsen, Michael
Loza-Alvarez, Pablo
Levecq, Xavier
Chateau, Nicolas
Pircher, Michael
INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2021, 62 (08)
[9] Robust Multi-Scale Multi-modal Image Registration
Holtzman-Gazit, Michal
Yavneh, Irad
SIGNAL PROCESSING, SENSOR FUSION, AND TARGET RECOGNITION XIX, 2010, 7697
[10] Deep multi-scale and multi-modal fusion for 3D object detection
Guo, Rui
Li, Deng
Han, Yahong
PATTERN RECOGNITION LETTERS, 2021, 151 : 236 - 242

← 1 2 3 4 5 →