Internet archive as a source of bilingual dictionary

被引:1
作者
Fattah, MA [1 ]
Ren, F [1 ]
Shingo, K [1 ]
机构
[1] Univ Tokushima, Fac Engn, Tokushima 7708506, Japan
来源
ITCC 2004: INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: CODING AND COMPUTING, VOL 2, PROCEEDINGS | 2004年
关键词
multilingual dictionaries; English/Arabic translation; multilingual thesaurus;
D O I
10.1109/ITCC.2004.1286650
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Parallel corpus is a very important tool to construct a good machine translation system or make any natural language processing research for cross language information retrieval. Internet archive is a good source of parallel documents in different languages. In order to construct a good parallel corpus from the Internet archive, Bilingual dictionary that contains word pairs which may not exist in commercial dictionaries is a must. Extracting a bilingual dictionary from the internet parallel documents is important to add words that are absent from the traditional dictionaries. This paper describes two algorithms to automatically extract an English/Arabic bilingual dictionary from parallel texts that exist in the Internet archive. The system should preferably be useful for many different language pairs. Like most of the systems done, the accuracy of our system is directly proportional to the amount of sentence pairs used By controlling the system parameters, we could achieve 100% precision for the output bilingual dictionary, but the size of the dictionary will be smaller.
引用
收藏
页码:298 / 302
页数:5
相关论文
共 9 条
  • [1] Ahrenberg L, 1998, P 36 ANN M ASS COMP, V1, P29
  • [2] CRAG JA, 2002, LNCS, P303
  • [3] DEJEAN H, 2002, P 19 INT C COMP LING, P218
  • [4] JULAPALLI M, 2002, CS224N LING237
  • [5] KUMIKO T, 1994, P 15 INT C COMP LING
  • [6] LFOURCADE M, 1997, PACLING 97, P171
  • [7] Nie JY, 1999, SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P74
  • [8] RESNIK P, 2003, IN PRESS COMPUTATION, V29
  • [9] TIEDEMANN J, 1998, 11 NORD C COMP LING