Dysarthric speaker identification with constrained training durations

被引:0
作者
Chaiani, Mounira [1 ]
Bengherabi, Messaoud [2 ]
Selouani, Sid Ahmed [3 ]
Boudraa, Malika [1 ]
机构
[1] Univ Sci & Technol Houari Boumediene, Fac Elect & Comp Sci, Algiers, Algeria
[2] CDTA, Algiers, Algeria
[3] Univ Moncton, Dept Informat Management, Campus Shippagan, Shippegan, NB E8S 1P6, Canada
来源
2018 INTERNATIONAL CONFERENCE ON SIGNAL, IMAGE, VISION AND THEIR APPLICATIONS (SIVA) | 2018年
关键词
Dysarthria; speaker identification; energy based VAD; MFCC; Auditory cues; GMM; scores fusion; Q-statistic; SPEECH;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Dysarthria is a neurological speech disorder that induces badly or no pronunciation of phonemes. In order to promote biometric identification of dysarthic speakers under constrained training scenario, we propose in this paper a recognition framework based on the score level fusion of two systems: The first is based on the classical Mel Frequency Cepstral Coefficients (MFCCs) while the second system uses Auditory Cues (ACs) which simulate the external, middle and inner parts of the ear. A simple energy based voice activity detector (VAD) is incorporated in both systems and its impact on performance is evaluated. The experimental investigations are accomplished using Nemours database and Torgo database and Gaussian Mixture Models (GMMs) for speaker modeling. The experimental results demonstrate the effectiveness of the energy based VAD, especially for the MFCC-based system. Moreover, the complementarity of the two features is manifested by a significant gain in identification performance of the fused system under different training durations. Interestingly, the proposed system surpasses the state of the art results and achieves 100% correct speaker identification under long duration training scenario.
引用
收藏
页数:6
相关论文
共 23 条
[1]  
[Anonymous], 2008 CAN C EL COMP E
[2]  
[Anonymous], 2008 1 WORKSH IM PRO
[3]  
[Anonymous], 2017, INT J ADV COMPUTER S
[4]  
[Anonymous], DYSARTHRIA
[5]  
[Anonymous], PHILOS T ROYAL SOC A
[6]  
[Anonymous], SPRINGER BRIEFS ELEC
[7]  
[Anonymous], APPL SOFT COMPUTING
[8]  
[Anonymous], SPEECH COMMUNICATION
[9]  
[Anonymous], TECH REP
[10]   COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].
DAVIS, SB ;
MERMELSTEIN, P .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366