A Study of Speech Phase in Dysarthria Voice Conversion System

被引:0
作者
Chen, Ko-Chiang [1 ]
Han, Ji-Yan [1 ]
Jhang, Sin-Hua [1 ]
Lai, Ying-Hui [1 ]
机构
[1] Natl Yang Ming Univ, Dept Biomed Engn, Taipei, Taiwan
来源
FUTURE TRENDS IN BIOMEDICAL AND HEALTH INFORMATICS AND CYBERSECURITY IN MEDICAL DEVICES, ICBHI 2019 | 2020年 / 74卷
关键词
Voice conversion; Speech phase; Dysarthria; NEURAL-NETWORKS;
D O I
10.1007/978-3-030-30636-6_31
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Dysarthria is a communication disorder common in people with damaged neuro-muscular apparatus resulting from events such as stroke. For a dysarthric speaker, voice conversion (VC) is one of the well-known approaches to improve speech intelligibility for a dysarthric speaker. Most of the well-known VC methods focus on converting amplitude features without phase information. Previous studies indicated that phase is an important factor in the speech signal. Therefore, we are interested in adding the correct phase information to VC for dysarthria speech. The results of automatic speech recognition and spectrum analysis show that intelligibility is improved by replacing the dysarthria phase with the normal phase during the synthesis step. It implies that the correct phase information must be considered for the dysarthria VC system.
引用
收藏
页码:219 / 226
页数:8
相关论文
共 19 条
[1]   Voice Conversion Using Deep Neural Networks With Layer-Wise Generative Training [J].
Chen, Ling-Hui ;
Ling, Zhen-Hua ;
Liu, Li-Juan ;
Dai, Li-Rong .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (12) :1859-1872
[2]  
Chorowski J, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P2256, DOI 10.1109/ICASSP.2018.8461282
[3]  
Elamvazuthi I., 2010, ARXIV PREPRINT ARXIV
[4]   SIGNAL ESTIMATION FROM MODIFIED SHORT-TIME FOURIER-TRANSFORM [J].
GRIFFIN, DW ;
LIM, JS .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (02) :236-243
[5]  
Helmholtz H., 2013, On the sensations of tone
[6]  
Hwang HT, 2013, INTERSPEECH, P3061
[7]   Exact indexing of dynamic time warping [J].
Keogh, E ;
Ratanamahatana, CA .
KNOWLEDGE AND INFORMATION SYSTEMS, 2005, 7 (03) :358-386
[8]  
Kim DS, 2000, INT CONF ACOUST SPEE, P1383, DOI 10.1109/ICASSP.2000.861838
[9]   An overview of voice conversion systems [J].
Mohammadi, Seyed Hamidreza ;
Kain, Alexander .
SPEECH COMMUNICATION, 2017, 88 :65-82
[10]   TRANSFORMATION OF FORMANTS FOR VOICE CONVERSION USING ARTIFICIAL NEURAL NETWORKS [J].
NARENDRANATH, M ;
MURTHY, HA ;
RAJENDRAN, S ;
YEGNANARAYANA, B .
SPEECH COMMUNICATION, 1995, 16 (02) :207-216