Creating domain-specific translation memories for machine translation finetuning: the TRENCARD bilingual cardiology corpus

被引:0
作者
Dogru, Gokhan [1 ]
机构
[1] Univ Autonoma Barcelona, Bellaterra, Spain
来源
TRADUMATICA-TRADUCCIO I TECNOLOGIES DE LA INFORMACIO I LA COMUNICACIO | 2024年 / 22期
关键词
bilingual corpus preparation; translation memory; machine translation; TRENCARD corpus;
D O I
10.5565/rev/tradumatica.313
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
This article investigates how translation memories (TMs) can be created by translators or other language professionals in order to compile domain-specific parallel corpora, which can then be used in different scenarios, such as machine translation training and fine-tuning, TM leveraging, and/or large language model fine-tuning. The article introduces a semi-automatic TM preparation methodology that primarily leverages translation tools used by translators, in the interests of data quality and control by translators themselves. This semi-automatic methodology is then used to build a cardiology-based Turkish -> English corpus from bilingual abstracts of Turkish cardiology journals. The resulting corpus, called TRENCARD Corpus, has approximately 800,000 source words and 50,000 sentences. Using this methodology, translators can build custom TMs in a reasonable time and use them in tasks requiring bilingual data.
引用
收藏
页码:1 / 30
页数:30
相关论文
共 40 条
  • [1] AdiDev, 2023, Nimdzi Language Technology Atlas: The Definitive Guide to the Language Technology Landscape
  • [2] [Anonymous], 2007, P 2 WORKSH STAT MACH, DOI DOI 10.3115/1626355.1626388
  • [3] [Anonymous], Turkish Journal of Thoracic and Cardiovascular Surgery
  • [4] [Anonymous], 1988, Turkiye Klinikleri Journal of Cardiology Journal Identity
  • [5] [Anonymous], Turkish Journal of Cardiovascular Nursing
  • [6] Archives of the Turkish Society of Cardiology, About us
  • [7] Aston G., 1999, TEXTUS, V12, P289
  • [8] Baker M., 1993, Text and Technology: In Honour of John Sinclair, P233, DOI DOI 10.1075/Z.64.15BAK
  • [9] Bowker L., 2002, Working with Specialized Language: A Practical Guide to Using Corpora, DOI [10.4324/9780203469255, DOI 10.4324/9780203469255]
  • [10] Chan S., 2015, Routledge Encyclopedia of Translation Technology