Improving Norwegian Translation of Bicycle Terminology Using Custom Named-Entity Recognition and Neural Machine Translation

被引:0
作者
Hellebust, Daniel [1 ]
Lawal, Isah A. [1 ]
机构
[1] Noroff Univ Coll, Dept Appl Data Sci, N-4612 Kristiansand, Norway
关键词
cycling; machine translation; named-entity recognition; domain-specific; transfer learning;
D O I
10.3390/electronics12102334
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The Norwegian business-to-business (B2B) market for bicycles consists mainly of international brands, such as Shimano, Trek, Cannondale, and Specialized. The product descriptions for these brands are usually in English and need local translation. However, these product descriptions include bicycle-specific terminologies that are challenging for online translators, such as Google. For this reason, local companies outsource translation or translate product descriptions manually, which is cumbersome. In light of the Norwegian B2B bicycle industry, this paper explores transfer learning to improve the machine translation of bicycle-specific terminology from English to Norwegian, including generic text. Firstly, we trained a custom Named-Entity Recognition (NER) model to identify cycling-specific terminology and then adapted a MarianMT neural machine translation model for the translation process. Due to the lack of publicly available bicycle-terminology-related datasets to train the proposed models, we created our dataset by collecting a corpus of cycling-related texts. We evaluated the performance of our proposed model and compared its performance with that of Google Translate. Our model outperformed Google Translate on the test set, with a SacreBleu score of 45.099 against 36.615 for Google Translate on average. We also created a web application where the user can input English text with related bicycle terminologies, and it will return the detected cycling-specific words in addition to a Norwegian translation.
引用
收藏
页数:14
相关论文
共 35 条
[1]  
Adelani D, 2022, NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, P3053
[2]  
Alt C, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P1388
[3]  
[Anonymous], 2022, WIK LIST BIC PARTS
[4]  
[Anonymous], 2017, P SOFTW DEM 15 C EUR
[5]  
[Anonymous], 2022, NORSK SPORTSBRANSJEF
[6]  
Bago P., 2022, P 23 ANN C EUR ASS M, P347
[7]  
Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, 10.48550/arXiv.1409.0473,1409.0473, DOI 10.48550/ARXIV.1409.0473,1409.0473]
[8]   Transfer Learning Methods as a New Approach in Computer Vision Tasks with Small Datasets [J].
Brodzicki, Andrzej ;
Piekarski, Michal ;
Kucharski, Dariusz ;
Jaworek-Korjakowska, Joanna ;
Gorgon, Marek .
FOUNDATIONS OF COMPUTING AND DECISION SCIENCES, 2020, 45 (03) :179-193
[9]  
Castilho Sheila, 2017, Prague Bulletin of Mathematical Linguistics, P109, DOI 10.1515/pralin-2017-0013
[10]  
Chowdhary K. R., 2020, Fundamentals of Artificial Intelligence, P603, DOI [10.1007/978-81-322-3972-7_19, DOI 10.1007/978-81-322-3972-7_19]