Automatic Acquisition of Large-scale Academic Bilingual Parallel Corpus from the Web

被引：0

作者：

Han Yong ^{[1
]}

Li Yu ^{[1
]}

He Xiaoning ^{[1
]}

Yang Muyun

Lei Guohua ^{[1
]}

机构：

[1] Heilongjiang Inst Technol, Comp Sci & Technol Dept, Harbin, Peoples R China

来源：

2009 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING | 2009年

基金：

中国国家自然科学基金;

关键词：

data mining; bilingual parallel corpora acquision; bilingual term acquision;

D O I：

10.1109/IALP.2009.75

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we describe a system which automatically acquires large-scale Chinese-English bilingual parallel corpus from China Journals Full-text Database (CJFD), a component of China National Knowledge Infrastructure (CNKI). The system gets large amount of parallel texts with domain information from the existing structured bilingual texts in CJFD, such as Chinese and English abstracts and titles of academic articles. The acquired Chinese-English parallel corpus is by several orders of magnitudes larger than similar corpus we have known before. In addition, this system collects a large amount of bilingual terms which can directly apply to lexical acquisition.

引用

页码：318 / 321

页数：4

共 23 条

[21] MonkeyPox2022Tweets: A Large-Scale Twitter Dataset on the 2022 Monkeypox Outbreak, Findings from Analysis of Tweets, and Open Research Questions [J].

Thakur, Nirmalya .

INFECTIOUS DISEASE REPORTS, 2022, 14 (06) :855-883

[22] Mining Sequential Risk Patterns From Large-Scale Clinical Databases for Early Assessment of Chronic Diseases: A Case Study on Chronic Obstructive Pulmonary Disease [J].

Cheng, Yi-Ting ;

Lin, Yu-Feng ;

Chiang, Kuo-Hwa ;

Tseng, Vincent S. .

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2017, 21 (02) :303-311

[23] STMNet: Scene Classification-Assisted and Texture Feature-Enhanced Multiscale Network for Large-Scale Urban Informal Settlement Extraction From Remote Sensing Images [J].

Du, Shouhang ;

Xing, Jianghe ;

Wang, Shaoyu ;

Wei, Liguang ;

Zhang, Yirui .

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 :13169-13187

← 1 2 3 →