Building parallel corpora by automatic title alignment

被引:0
|
作者
Yang, CC [1 ]
Li, KW [1 ]
机构
[1] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Sha Tin 100083, Peoples R China
来源
DIGITAL LIBRARIES: PEOPLE, KNOWLEDGE, AND TECHNOLOGY, PROCEEDINGS | 2002年 / 2555卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-lingual semantic interoperability has drawn significant research attention recently, as the number of digital libraries in non-English languages has grown exponentially. Cross-lingual information retrieval (CLIR) across different European languages, such as English, Spanish and French, has been widely explored, but CLIR across European and Oriental languages is still at the initial stages. To cross the language boundary, a corpus-based approach shows promise of overcoming the limitations of knowledge-based and controlled vocabulary approaches. However, collecting parallel corpora between European and Oriental languages is not an easy task. Length-based and text-based approaches are two major approaches to align parallel documents. In this paper, we investigate several techniques using these approaches, and compare their performance in aligning English and Chinese titles of parallel documents available on the Web.
引用
收藏
页码:328 / 339
页数:12
相关论文
共 50 条
  • [21] Automatic Acquisition of Parallel Corpora from Websites with Dynamic Content
    Tsvetkov, Yulia
    Wintner, Shuly
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 3389 - 3392
  • [22] Research of English-Chinese alignment at word granularity on parallel corpora
    Xu Yang
    Wang Hou-feng
    Lue Xue-qiang
    7TH IEEE/ACIS INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE IN CONJUNCTION WITH 2ND IEEE/ACIS INTERNATIONAL WORKSHOP ON E-ACTIVITY, PROCEEDINGS, 2008, : 223 - +
  • [23] An alignment method for noisy parallel corpora based on image processing techniques
    Chang, JS
    Chen, MH
    35TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 8TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 1997, : 297 - 304
  • [24] A Sentence Meaning Based Alignment Method for Parallel Text Corpora Preparation
    Wolk, Krzysztof
    Marasek, Krzysztof
    NEW PERSPECTIVES IN INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 1, 2014, 275 : 229 - 237
  • [25] Translating medical terminologies through word alignment in parallel text corpora
    Deleger, Louise
    Merkel, Magnus
    Zweigenbaum, Pierre
    JOURNAL OF BIOMEDICAL INFORMATICS, 2009, 42 (04) : 692 - 701
  • [26] Alignment of bilingual named entities in parallel corpora using statistical model
    Lee, CJ
    Chang, JS
    Chuang, TC
    MACHINE TRANSLATION: FROM REAL USERS TO RESEARCH, PROCEEDINGS, 2004, 3265 : 144 - 153
  • [27] Automatic Concept Discovery from Parallel Text and Visual Corpora
    Sun, Chen
    Gan, Chuang
    Nevatia, Ram
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2596 - 2604
  • [28] Automatic Dictionary Expansion Using Non-parallel Corpora
    Rapp, Reinhard
    Zock, Michael
    ADVANCES IN DATA ANALYSIS, DATA HANDLING AND BUSINESS INTELLIGENCE, 2010, : 317 - +
  • [29] Automatic array alignment in parallel Matlab scripts
    Milosavljevic, IZ
    Jabri, MA
    IPPS/SPDP 1999: 13TH INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM & 10TH SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING, PROCEEDINGS, 1999, : 285 - 289
  • [30] Building Comparable Corpora for Assessing Multi-Word Term Alignment
    Adjali, Omar
    Morin, Emmanuel
    Zweigenbaum, Pierre
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 3103 - 3112