Sibilant consonants classification comparison with multi- and single-class neural networks

被引：9

作者：

Anjos, Ivo ^{[1
]}

Cavalheiro Marques, Nuno ^{[1
]}

Grilo, Margarida ^{[2
]}

Guimaraes, Isabel ^{[2
,3
]}

Magalhaes, Joao ^{[1
]}

Cavaco, Sofia ^{[1
]}

机构：

[1] Univ Nova Lisboa, Fac Ciencias & Tecnol, Dept Comp Sci, NOVA LINCS, P-2829516 Caparica, Portugal

[2] Escola Super Saude Alcoitao, Alcabideche, Portugal

[3] Univ Lisbon, Fac Med, Inst Med Mol, Clin Pharmacol Unit, Lisbon, Portugal

来源：

EXPERT SYSTEMS | 2020年 / 37卷 / 06期

关键词：

deep learning; machine learning; neural networks; sibilant consonants; speech and language therapy; SPEECH-THERAPY; RECOGNITION; APHASIA;

D O I：

10.1111/exsy.12620

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Many children with speech sound disorders cannot pronounce the sibilant consonants correctly. We have developed a serious game, which is controlled by the children's voices in real time, with the purpose of helping children on practicing the production of European Portuguese (EP) sibilant consonants. For this, the game uses a sibilant consonant classifier. Since the game does not require any type of adult supervision, children can practice producing these sounds more often, which may lead to faster improvements of their speech. Recently, the use of deep neural networks has given considerable improvements in the classification of a variety of use cases, from image classification to speech and language processing. Here, we propose to use deep convolutional neural networks to classify sibilant phonemes of EP in our serious game for speech and language therapy. We compared the performance of several different artificial neural networks that used Mel frequency cepstral coefficients or log Mel filterbanks. Our best deep learning model achieves classification scores of 95.48% using a 2D convolutional model with log Mel filterbanks as input features. Such results are then further improved for specific classes with simple binary classifiers.

引用

页数：12

共 36 条

[1] Amodei D, 2016, PR MACH LEARN RES, V48
[2] A Serious Mobile Game with Visual Feedback for Training Sibilant Consonants
Anjos, Ivo
Grilo, Margarida
Ascensao, Mariana
Guimaraes, Isabel
Magalhaes, Joao
Cavaco, Sofia
[J]. ADVANCES IN COMPUTER ENTERTAINMENT TECHNOLOGY, ACE 2017, 2018, 10714 : 430 - 450
[3] [Anonymous], 2015, P ANN C INT SPEECH C
[4] [Anonymous], 2009, P INT
[5] TRIAL OF INTENSIVE COMPARED WITH WEEKLY SPEECH-THERAPY IN PRESCHOOL-CHILDREN
BARRATT, J
LITTLEJOHNS, P
THOMPSON, J
[J]. ARCHIVES OF DISEASE IN CHILDHOOD, 1992, 67 (01) : 106 - 108
[6] Benselama Z. A., 2007, Journal of Computer Sciences, V3, P685, DOI 10.3844/jcssp.2007.685.692
[7] Intensity of aphasia therapy, impact on recovery
Bhogal, SK
Teasell, R
Speechley, M
[J]. STROKE, 2003, 34 (04) : 987 - 992
[8] Carvalho MdeS., 2008, THESIS
[9] Chollet Francois, 2015, Keras
[10] On the use of support vector machines for phonetic classification
Clarkson, P
Moreno, PJ
[J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 585 - 588

← 1 2 3 4 →