Automatic construction of web-based English/Chinese parallel corpora

被引:1
|
作者
Tan Bin [1 ]
Zhou Xu-yan [1 ,2 ]
机构
[1] Jinggangshan Univ, Dept Comput, Jian 343009, Jiangxi, Peoples R China
[2] East China JiaoTong Univ, Nanchang 330000, Jiangxi, Peoples R China
来源
2010 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY AND SECURITY INFORMATICS (IITSI 2010) | 2010年
关键词
Parallel corpora; vector space; Jacobi correlation coefficient;
D O I
10.1109/IITSI.2010.124
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As the demand for global information increases significantly, multilingual corpora has become a valuable linguistic resource for applications to cross-lingual information retrieval and natural language processing. A Web-based English Chinese bilingual parallel corpus of automatic Construction Technology solved the shortage of bilingual English-Chinese Parallel Corpus. First, some web pages which may be set translation dig of from a particular source, and then from the web pages focused on the external characteristics according to the similarity to extract the candidate web pages in parallel pairs, use of content-based methods on parallel web pages for each of these candidates assessed. In the assessment of the candidate pairs of parallel web pages, this paper design ECVS models of bilingual text similarity assessed based on the classic vector space model.
引用
收藏
页码:114 / 117
页数:4
相关论文
共 50 条
  • [1] Automatic construction of English/Chinese parallel corpora
    Yang, CC
    Li, KW
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2003, 54 (08): : 730 - 742
  • [2] NNexus: An Automatic Linker for Collaborative Web-Based Corpora
    Gardner, James
    Krowne, Aaron
    Xiong, Li
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (06) : 829 - 839
  • [3] Web-based parallel corpora for statistical machine translation
    Li, Bo
    Liu, Juan
    Shi, Wenjuan
    ICMLA 2007: SIXTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2007, : 444 - 449
  • [4] On the Construction of Web-based English Testing Platform
    Li, Qingqing
    Liang, Xia
    MECHANICAL, MATERIALS AND MANUFACTURING ENGINEERING, PTS 1-3, 2011, 66-68 : 2224 - +
  • [5] Automatic acquisition of Chinese-English parallel corpus from the web
    Zhang, Ying
    Wu, Ke
    Gao, Jianfeng
    Vines, Phil
    ADVANCES IN INFORMATION RETRIEVAL, 2006, 3936 : 420 - 431
  • [6] A web-based Chinese automatic question answering system
    Cai, DF
    Cui, H
    Miao, XL
    Zhao, CG
    Ren, XS
    FOURTH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY, PROCEEDINGS, 2004, : 1141 - 1146
  • [7] Extracting Historical Terms Based on Aligned Chinese-English Parallel Corpora
    Li, Xiuying
    Che, Chao
    Han, Limin
    Liu, Xiaoxia
    IEEE NLP-KE 2009: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2009, : 296 - 301
  • [8] Corpora Processing and Computational Scaffolding for a Web-based English Learning Environment: The CANDLE Project
    Liou, Hsien-Chin
    Chang, Jason S.
    Chen, Hao-Jan
    Lin, Chih-Cheng
    Liaw, Meei-Ling
    Gao, Zhao-Ming
    Jang, Jyh-Shing Roger
    Yeh, Yuli
    Chuang, Thomas C.
    You, Geeng-Neng
    CALICO JOURNAL, 2007, 24 (01): : 77 - 95
  • [9] Determining the semantic orientation of web-based corpora
    Scharl, A
    Pollach, I
    Bauer, C
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING, 2003, 2690 : 840 - 849
  • [10] Mining large samples of web-based corpora
    Scharl, A
    Bauer, C
    KNOWLEDGE-BASED SYSTEMS, 2004, 17 (5-6) : 229 - 233