共 23 条
Automatic Acquisition of Large-scale Academic Bilingual Parallel Corpus from the Web
被引:0
作者:
Han Yong
[1
]
Li Yu
[1
]
He Xiaoning
[1
]
Yang Muyun
Lei Guohua
[1
]
机构:
[1] Heilongjiang Inst Technol, Comp Sci & Technol Dept, Harbin, Peoples R China
来源:
2009 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING
|
2009年
基金:
中国国家自然科学基金;
关键词:
data mining;
bilingual parallel corpora acquision;
bilingual term acquision;
D O I:
10.1109/IALP.2009.75
中图分类号:
TP18 [人工智能理论];
学科分类号:
081104 ;
0812 ;
0835 ;
1405 ;
摘要:
In this paper, we describe a system which automatically acquires large-scale Chinese-English bilingual parallel corpus from China Journals Full-text Database (CJFD), a component of China National Knowledge Infrastructure (CNKI). The system gets large amount of parallel texts with domain information from the existing structured bilingual texts in CJFD, such as Chinese and English abstracts and titles of academic articles. The acquired Chinese-English parallel corpus is by several orders of magnitudes larger than similar corpus we have known before. In addition, this system collects a large amount of bilingual terms which can directly apply to lexical acquisition.
引用
收藏
页码:318 / 321
页数:4
相关论文
共 23 条