Lexicon plus TX: rapid construction of a multilingual lexicon with under-resourced languages

被引:1
作者
Lim, Lian Tze [1 ,2 ]
Soon, Lay-Ki [2 ]
Lim, Tek Yong [2 ]
Tang, Enya Kong [3 ]
Ranaivo-Malancon, Bali [4 ]
机构
[1] KDU Coll Penang, Sch Engn Sci & Technol, Georgetown 10400, Penang, Malaysia
[2] Multimedia Univ, Fac Comp & Informat, Cyberjaya 63100, Selangor, Malaysia
[3] Bandar Univ Teknol Legenda, Linton Univ Coll, Persiaran UTL, Mantin 71700, Negeri Sembilan, Malaysia
[4] Univ Malaysia Sarawak, Fac Comp Sci & Informat Technol, Kota Samarahan 94300, Sarawak, Malaysia
关键词
Multilingual lexicon; Under-resourced languages; Malay; Iban;
D O I
10.1007/s10579-013-9253-0
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Most efforts at automatically creating multilingual lexicons require input lexical resources with rich content (e.g. semantic networks, domain codes, semantic categories) or large corpora. Such material is often unavailable and difficult to construct for under-resourced languages. In some cases, particularly for some ethnic languages, even unannotated corpora are still in the process of collection. We show how multilingual lexicons with under-resourced languages can be constructed using simple bilingual translation lists, which are more readily available. The prototype multilingual lexicon developed comprise six member languages: English, Malay, Chinese, French, Thai and Iban, the last of which is an under-resourced language in Borneo. Quick evaluations showed that 91.2 % of 500 random multilingual entries in the generated lexicon require minimal or no human correction.
引用
收藏
页码:479 / 492
页数:14
相关论文
共 30 条
[1]  
[Anonymous], 2004, Romanian Journal of Information Science and Technology
[2]  
[Anonymous], 2002, P 1 INT WORDNET C MY
[3]  
[Anonymous], 2012, PROC GLOBAL WORDNET
[4]  
[Anonymous], 1998, WordNet, DOI DOI 10.7551/MITPRESS/7287.001.0001
[5]  
Berment V., 2004, THESIS U J FOURIER G
[6]  
Boitet C., 2002, P 2 WORKSH NLP XML N, P1
[7]  
Boitet C., 2011, P 25 PAC AS C LANG I
[8]  
Bond F., 2001, MT SUMM 8 SANT DE CO, P53
[9]   Combining linguistic resources to create a machine-tractable Japanese-Malay dictionary [J].
Bond, Francis ;
Ogura, Kentaro .
LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (02) :127-136
[10]  
Daoud M., 2009, P 2 INT C AR LANG RE