On the Design of Web Crawlers for Constructing an Efficient Chinese-Portuguese Bilingual Corpus System
被引:0
|
作者:
Cheong, Sio Tai
论文数: 0引用数: 0
h-index: 0
机构:
Macao Polytech Inst, Macau, Peoples R ChinaMacao Polytech Inst, Macau, Peoples R China
Cheong, Sio Tai
[1
]
Xu, Jiabo
论文数: 0引用数: 0
h-index: 0
机构:
Macao Polytech Inst, Macau, Peoples R ChinaMacao Polytech Inst, Macau, Peoples R China
Xu, Jiabo
[1
]
Liu, Yue
论文数: 0引用数: 0
h-index: 0
机构:
Macao Polytech Inst, Macau, Peoples R ChinaMacao Polytech Inst, Macau, Peoples R China
Liu, Yue
[1
]
机构:
[1] Macao Polytech Inst, Macau, Peoples R China
来源:
2018 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC)
|
2018年
关键词:
Machine Translation;
Web Crawler;
Bilingual Corpus;
Machine Learning;
NLP;
D O I:
暂无
中图分类号:
TM [电工技术];
TN [电子技术、通信技术];
学科分类号:
0808 ;
0809 ;
摘要:
Machine Translation is a very popular and important topic in Natural Language Processing (NLP) during the last few decades. This paper focuses on the design of the Web Crawlers for Chinese-Portuguese bilingual corpus construction, and this corpus would be used in corresponding Machine Translation systems. It accomplished a bilingual corpus construction process from bilingual corpus collection with web crawlers based on different sources. By this mean, this system can be considered as an innovative and reasonable attempt in setting up the bilingual corpora with Chinese and Portuguese, and it has solved some practical problems at the initial stage of the corpus construction.
机构:
Natural Language Processing and Knowledge Discovery Laboratory, Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh CityNatural Language Processing and Knowledge Discovery Laboratory, Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City