An End-to-End Approach to Automatic Speech Assessment for Cantonese-speaking People with Aphasia

被引：17

作者：

Qin, Ying ^{[1
]}

Wu, Yuzhong ^{[1
]}

Lee, Tan ^{[1
]}

Kong, Anthony Pak Hin ^{[2
]}

机构：

[1] Chinese Univ Hong Kong, Dept Elect Engn, Shatin, Hong Kong, Peoples R China

[2] Univ Cent Florida, Sch Commun Sci & Disorders, Orlando, FL 32816 USA

来源：

JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY | 2020年 / 92卷 / 08期

基金：

美国国家卫生研究院;

关键词：

Pathological speech assessment; End-to-end; Aphasia; Cantonese; Deep neural network; CLASSIFICATION;

D O I：

10.1007/s11265-019-01511-3

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Conventional automatic assessment of pathological speech usually follows two main steps: (1) extraction of pathology-specific features; (2) classification or regression on extracted features. Given the great variety of speech and language disorders, feature design is never a straightforward task, and yet it is most crucial to the performance of assessment. This paper presents an end-to-end approach to automatic speech assessment for Cantonese-speaking People With Aphasia (PWA). The assessment is formulated as a binary classification task to discriminate PWA with high scores of subjective assessment from those with low scores. The 2-layer Gated Recurrent Unit (GRU) and Convolutional Neural Network (CNN) models are applied to realize the end-to-end mapping from basic speech features to the classification outcome. The pathology-specific features used for assessment are learned implicitly by the neural network model. The Class Activation Mapping (CAM) method is utilized to visualize how the learned features contribute to the assessment result. Experimental results show that the end-to-end approach can achieve comparable performance to the conventional two-step approach in the classification task, and the CNN model is able to learn impairment-related features that are similar to the hand-crafted features. The experimental results also indicate that CNN model performs better than 2-layer GRU model in this specific task.

引用

页码：819 / 830

页数：12

共 40 条

[1]

Adam H., 2014, J. Lang. Linguist. Stud., V10, P153

[2]

BENSON DF, 1996, APHASIA CLIN PERSPEC, P89

[3]

CHO K, 1259, ARXIVCSCL1409

[4]

Chung-Ming Huang, 2017, 2017 27th International Telecommunication Networks and Applications Conference (ITNAC), P1, DOI 10.1109/ATNAC.2017.8215357

[5] Approximate statistical tests for comparing supervised classification learning algorithms [J].

Dietterich, TG .

NEURAL COMPUTATION, 1998, 10 (07) :1895-1923

[6] An introduction to ROC analysis [J].

Fawcett, Tom .

PATTERN RECOGNITION LETTERS, 2006, 27 (08) :861-874

[7]

Fraser K., 2013, P 4 WORKSHOP SPEECH, P47

[8]

Fraser KC, 2013, INTERSPEECH, P2176

[9] Automated classification of primary progressive aphasia subtypes from narrative speech transcripts [J].

Fraser, Kathleen C. ;

Meltzer, Jed A. ;

Graham, Naida L. ;

Leonard, Carol ;

Hirst, Graeme ;

Black, Sandra E. ;

Rochon, Elizabeth .

CORTEX, 2014, 55 :43-60

[10]

Graves A, 2014, PR MACH LEARN RES, V32, P1764

← 1 2 3 4 →