Breaking the Unwritten Language Barrier: The BULB Project

被引:44
作者
Adda, Gilles [1 ]
Stueker, Sebastian [2 ]
Adda-Decker, Martine [1 ,3 ]
Ambouroue, Odette [4 ]
Besacier, Laurent [5 ]
Blachon, David [5 ]
Bonneau-Maynard, Helene [1 ]
Godard, Pierre [1 ]
Hamlaoui, Fatima [6 ]
Idiatov, Dmitry [4 ]
Kouarata, Guy-Noel [3 ]
Lamel, Lori [1 ]
Makasso, Emmanuel-Moselly [6 ]
Rialland, Annie [3 ]
de Velde, Mark Van [4 ]
Yvon, Francois [1 ]
Zerbian, Sabine [7 ]
机构
[1] Univ Paris Saclay, CNRS, LIMSI, St Aubin, France
[2] Karlsruhe Inst Technol, Inst Anthropomat & Robot, D-76021 Karlsruhe, Germany
[3] CNRS Paris 3 Sorbonne Nouvelle, LPP, Paris, France
[4] Langage Langues & Cultures Afrique Noire Lab LLAC, Villejuif, France
[5] LIG, GETALP Grp, Grenoble, France
[6] ZAS, Berlin, Germany
[7] Univ Stuttgart, Inst Linguist, Stuttgart, Germany
来源
SLTU-2016 5TH WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGIES FOR UNDER-RESOURCED LANGUAGES | 2016年 / 81卷
关键词
Language documentation; automatic phonetic transcription; unwritten languages; automatic alignment;
D O I
10.1016/j.procs.2016.04.023
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The project Breaking the Unwritten Language Barrier (BULB), which brings together linguists and computer scientists, aims at supporting linguists in documenting unwritten languages. In order to achieve this we develop tools tailored to the needs of documentary linguists by building upon technology and expertise from the area of natural language processing, most prominently automatic speech recognition and machine translation. As a development and test bed for this we have chosen three less-resourced African languages from the Bantu family: Basaa, Myene and Embosi. Work within the project is divided into three main steps: 1) Collection of a large corpus of speech (100h per language) at a reasonable cost. For this we use standard mobile devices and a dedicated software-Lig-Aikuma. After initial recording, the data is re-spoken by a reference speaker to enhance the signal quality and orally translated into French. 2) Automatic transcription of the Bantu languages at phoneme level and the French translation at word level. The recognized Bantu phonemes and French words will then be automatically aligned. 3) Tool development. In close cooperation and discussion with the linguists, the speech and language technologists will design and implement tools that will support the linguists in their work, taking into account the linguists' needs and technology's capabilities. (C) 2016 The Authors. Published by Elsevier B.V.
引用
收藏
页码:8 / 14
页数:7
相关论文
共 44 条
[1]  
[Anonymous], P 1 INT WORKSH SPOK
[2]  
[Anonymous], LECT ENDANGERED LANG
[3]  
[Anonymous], THESIS
[4]  
[Anonymous], 2013, P 6 INT JOINT C NAT
[5]  
[Anonymous], ASPECTS DU BASAA
[6]  
[Anonymous], THESIS
[7]  
[Anonymous], 1973, DICT BASAA FRANCAIS
[8]  
[Anonymous], 2013, THESIS
[9]  
[Anonymous], 2000, VANISHING VOICES
[10]  
[Anonymous], 2015, P 44 ACAL M SOM CASC