Creating domain-specific translation memories for machine translation finetuning: the TRENCARD bilingual cardiology corpus

被引:0
作者
Dogru, Gokhan [1 ]
机构
[1] Univ Autonoma Barcelona, Bellaterra, Spain
来源
TRADUMATICA-TRADUCCIO I TECNOLOGIES DE LA INFORMACIO I LA COMUNICACIO | 2024年 / 22期
关键词
bilingual corpus preparation; translation memory; machine translation; TRENCARD corpus;
D O I
10.5565/rev/tradumatica.313
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
This article investigates how translation memories (TMs) can be created by translators or other language professionals in order to compile domain-specific parallel corpora, which can then be used in different scenarios, such as machine translation training and fine-tuning, TM leveraging, and/or large language model fine-tuning. The article introduces a semi-automatic TM preparation methodology that primarily leverages translation tools used by translators, in the interests of data quality and control by translators themselves. This semi-automatic methodology is then used to build a cardiology-based Turkish -> English corpus from bilingual abstracts of Turkish cardiology journals. The resulting corpus, called TRENCARD Corpus, has approximately 800,000 source words and 50,000 sentences. Using this methodology, translators can build custom TMs in a reasonable time and use them in tasks requiring bilingual data.
引用
收藏
页码:1 / 30
页数:30
相关论文
共 40 条
  • [31] Nieminen T, 2021, EACL 2021: THE 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: PROCEEDINGS OF THE SYSTEM DEMONSTRATIONS, P288
  • [32] Translation as human-computer interaction
    O'Brien, Sharon
    [J]. TRANSLATION SPACES, 2012, 1 (01) : 101 - 122
  • [33] Prez-Ortiz Juan Antonio., 2022, Machine translation for everyone: Empowering users in the age of artificial intelligence, DOI [10.5281/ZENODO.6760020, DOI 10.5281/ZENODO.6760020]
  • [34] Ramirez-Sanchez Gema, 2022, Machine translation for everyone: Empowering users in the age of artificial intelligence, P165, DOI DOI 10.5281/ZENODO.6760022
  • [35] Rothwell A, 2019, J SPEC TRANSL, P26
  • [36] Sánchez-Gijón P, 2009, BENJAMIN TRANSL LIB, V82, P109
  • [37] Tiedemann J.., 2004, P 4 INT C LANG RES E
  • [38] Tiedemann Jorg, 2020, P 22 ANN CONFERENEC
  • [39] Yi K, 2023, Arxiv, DOI [arXiv:2205.09616, DOI 10.33965/IHCI2019201906L016, 10.2316/P.2011.721-109]
  • [40] Zanettin Federico., 2012, Translation-Driven Corpora: Corpus Resources for Descriptive and Applied Translation Studies