Towards Fully Automatic Annotation of Audiobooks for TTS

被引：0

作者：

Boeffard, Olivier ^{[1
]}

Charonnat, Laure ^{[1
]}

Le Maguer, Sebastien ^{[1
]}

Lolive, Damien ^{[1
]}

Vidal, Gaelle ^{[1
]}

机构：

[1] Univ Rennes 1, IRISA, Lannion, France

来源：

LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2012年

关键词：

Audiobook; annotation; phone segmentation; SPEECH; ALIGNMENT;

D O I：

暂无

中图分类号：

H0 [语言学];

学科分类号：

030303 ; 0501 ; 050102 ;

摘要：

Building speech corpora is a first and crucial step for every text-to-speech synthesis system. Nowadays, the use of statistical models implies the use of huge sized corpora that need to be recorded, transcribed, annotated and segmented to be usable. The variety of corpora necessary for recent applications (content, style, etc.) makes the use of existing digital audio resources very attractive. Among all available resources, audiobooks, considering their quality, are interesting. Considering this framework, we propose a complete acquisition, segmentation and annotation chain for audiobooks that tends to be fully automatic. The proposed process relies on a data structure, ROOTs, that establishes the relations between the different annotation levels represented as sequences of items. This methodology has been applied successfully on 11 hours of speech extracted from an audiobook. A manual check, on a part of the corpus, shows the efficiency of the process.

引用

页码：975 / 980

页数：6

共 11 条

[1] Barbot N., 2011, P INT C SPEECH COMM, P1501
[2] Transcriber: Development and use of a tool for assisting speech corpora production
Barras, C
Geoffrois, E
Wu, ZB
Liberman, M
[J]. SPEECH COMMUNICATION, 2001, 33 (1-2) : 5 - 22
[3] Bechet F., 2001, TRAITEMENT AUTOMATIQ, V42
[4] Braunschweiler N, 2010, 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, P2222
[5] A FACTOR AUTOMATON APPROACH FOR THE FORCED ALIGNMENT OF LONG SPEECH RECORDINGS
Moreno, Pedro J.
Alberti, Christopher
[J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4869 - 4872
[6] Nuance, 2010, DRAG NAT SPEAK SDK S
[7] Prahallad K., 2007, P INT CIT
[8] Synapse, 2011, DOC TECHN COMP ET LE
[9] Synapse Developpement, 2011, DOC TECHN EXTR ENT N
[10] A Dynamic Alignment Algorithm for Imperfect Speech and Transcript
Tao, Ye
Li, Xueqing
Wu, Bian
[J]. COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2010, 7 (01) : 75 - 84

← 1 2 →