TEXT-INFORMED SPEECH INPAINTING VIA VOICE CONVERSION

被引:0
作者
Prablanc, Pierre [1 ]
Ozerov, Alexey [1 ]
Duong, Ngoc Q. K. [1 ]
Perez, Patrick [1 ]
机构
[1] Technicolor, 975 Ave Champs Blanes,CS 17616, F-35576 Cesson Sevigne, France
来源
2016 24TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO) | 2016年
关键词
Audio inpainting; speech inpainting; voice conversion; Gaussian mixture model; speech synthesis; MAXIMUM-LIKELIHOOD; TRANSFORMATION;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The problem of speech inpainting consists in recovering some parts in a speech signal that are missing for some reasons. To our best knowledge none of the existing methods allows satisfactory inpainting of missing parts of large size such as one second and longer. In this work we address this challenging scenario. Since in the case of such long missing parts entire words can be lost, we assume that the full text uttered in the speech signal is known. This leads to a new concept of text-informed speech inpainting. To solve this problem we propose a method that is based on synthesizing the missing speech by a speech synthesizer, on modifying its vocal characteristics via a voice conversion method, and on filling in the missing part with the resulting converted speech sample. We carried subjective listening tests to compare the proposed approach with two baseline methods.
引用
收藏
页码:878 / 882
页数:5
相关论文
共 25 条
[1]  
ABEL JS, 1991, INT CONF ACOUST SPEE, P1745, DOI 10.1109/ICASSP.1991.150655
[2]   Audio Inpainting [J].
Adler, Amir ;
Emiya, Valentin ;
Jafari, Maria G. ;
Elad, Michael ;
Gribonval, Remi ;
Plumbley, Mark D. .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (03) :922-932
[3]   A preliminary demonstration of exemplar-based voice conversion for articulation disorders using an individuality-preserving dictionary [J].
Aihara, Ryo ;
Takashima, Ryoichi ;
Takiguchi, Tetsuya ;
Ariki, Yasuo .
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2014,
[4]   Self-content-based audio inpainting [J].
Bahat, Yuval ;
Schechner, Yoav Y. ;
Elad, Michael .
SIGNAL PROCESSING, 2015, 111 :61-72
[5]   Image inpainting [J].
Bertalmio, M ;
Sapiro, G ;
Caselles, V ;
Ballester, C .
SIGGRAPH 2000 CONFERENCE PROCEEDINGS, 2000, :417-424
[6]  
Bregler Christoph, 1997, SIGGRAPH
[7]   Speech Spectral Envelope Enhancement by HMM-Based Analysis/Resynthesis [J].
Carmona, Jose L. ;
Barker, Jon ;
Gomez, Angel M. ;
Ma, Ning .
IEEE SIGNAL PROCESSING LETTERS, 2013, 20 (06) :563-566
[8]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[9]   A BAYESIAN-APPROACH TO THE RESTORATION OF DEGRADED AUDIO SIGNALS [J].
GODSILL, SJ ;
RAYNER, PJW .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1995, 3 (04) :267-278
[10]  
Helander E., 2012, THESIS