Continuous Vocal Imitation with Self-organized Vowel Spaces in Recurrent Neural Network

被引：0

作者：

Kanda, Hisashi ^{[1
]}

Ogata, Tetsuya ^{[1
]}

Takahashi, Toru ^{[1
]}

Komatani, Kazunori ^{[1
]}

Okuno, Hiroshi G. ^{[1
]}

机构：

[1] Kyoto Univ, Grad Sch Informat, Dept Intelligence Sci & Technol, Kyoto, Japan

来源：

ICRA: 2009 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS 1-7 | 2009年

关键词：

SPEECH; SYSTEMS; SOUNDS;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

A continuous vocal imitation system was developed using a computational model that explains the process of phoneme acquisition by infants. Human infants perceive speech sounds not as discrete phoneme sequences but as continuous acoustic signals. One of critical problems in phoneme acquisition is the design for segmenting these continuous speech sounds. The key idea to solve this problem is that articulatory mechanisms such as the vocal tract help human beings to perceive speech sound units corresponding to phonemes. To segment acoustic signal with articulatory movement, we apply the segmenting method to our system by Recurrent Neural Network with Parametric Bias (RNNPB). This method determines the multiple segmentation boundaries in a temporal sequence using the prediction error of the RNNPB model, and the PB values obtained by the method can be encoded as kind of phonemes. Our system was implemented by using a physical vocal tract model, called the Maeda model. Experimental results demonstrated that our system can self-organize the same phonemes in different continuous sounds, and can imitate vocal sound involving arbitrary numbers of vowels using the vowel space in the RNNPB. This suggests that our model reflects the process of phoneme acquisition.

引用

页码：4036 / 4041

页数：6

共 18 条

[1] Self-organization in vowel systems [J].

de Boer, B .

JOURNAL OF PHONETICS, 2000, 28 (04) :441-465

[2] Speech listening specifically modulates the excitability of tongue muscles: a TMS study [J].

Fadiga, L ;

Craighero, L ;

Buccino, G ;

Rizzolatti, G .

EUROPEAN JOURNAL OF NEUROSCIENCE, 2002, 15 (02) :399-402

[3]

Hickok G, 2003, J COGNITIVE NEUROSCI, V15, P673, DOI 10.1162/089892903322307393

[4] Longitudinal developmental changes in spectral peaks of vowels produced by Japanese infants [J].

Ishizuka, Kentaro ;

Mugitani, Ryoko ;

Kato, Hiroko ;

Amano, Shigeaki .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2007, 121 (04) :2272-2282

[5]

JORDAN M, 1986, ANN C COG SCI SOC, P513

[6]

KANDA H, 2008, IEEE RSJ IROS 2008

[7] Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction:: Possible role of a repetitive structure in sounds [J].

Kawahara, H ;

Masuda-Katsuse, I ;

de Cheveigné, A .

SPEECH COMMUNICATION, 1999, 27 (3-4) :187-207

[8]

KITAWAKI N, 1978, T IECE JAP A, V61, P119

[9] Phonetic learning as a pathway to language: new data and native language magnet theory expanded (NLM-e) [J].

Kuhl, Patricia K. ;

Conboy, Barbara T. ;

Coffey-Corina, Sharon ;

Padden, Denise ;

Rivera-Gaxiola, Maritza ;

Nelson, Tobey .

PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2008, 363 (1493) :979-1000

[10]

Liberman AM, 1962, P SPEECH COMM SEM ST

← 1 2 →