On the Design of Web Crawlers for Constructing an Efficient Chinese-Portuguese Bilingual Corpus System

被引:0
|
作者
Cheong, Sio Tai [1 ]
Xu, Jiabo [1 ]
Liu, Yue [1 ]
机构
[1] Macao Polytech Inst, Macau, Peoples R China
来源
2018 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC) | 2018年
关键词
Machine Translation; Web Crawler; Bilingual Corpus; Machine Learning; NLP;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Machine Translation is a very popular and important topic in Natural Language Processing (NLP) during the last few decades. This paper focuses on the design of the Web Crawlers for Chinese-Portuguese bilingual corpus construction, and this corpus would be used in corresponding Machine Translation systems. It accomplished a bilingual corpus construction process from bilingual corpus collection with web crawlers based on different sources. By this mean, this system can be considered as an innovative and reasonable attempt in setting up the bilingual corpora with Chinese and Portuguese, and it has solved some practical problems at the initial stage of the corpus construction.
引用
收藏
页码:9 / 12
页数:4
相关论文
共 2 条
  • [1] Constructing High Quality Bilingual Corpus using Parallel Data from the Web
    Cheok, Sai Man
    Hoi, Lap Man
    Tang, Su-Kit
    Tse, Rita
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON INTERNET OF THINGS, BIG DATA AND SECURITY (IOTBDS), 2022, : 127 - 132
  • [2] Constructing a Chinese-Vietnamese bilingual corpus from subtitle websites
    Nguyen, Phuc-Nghi
    Tran, Phuoc
    International Journal of Intelligent Information and Database Systems, 2024, 16 (04) : 385 - 408