Comparison of Korean Preprocessing Performance according to Tokenizer in NMT Transformer Model

被引:4
|
作者
Kim, Geumcheol [1 ]
Lee, Sang-Hong [1 ]
机构
[1] Anyang Univ, Dept Comp Sci & Engn, Anyang Si, South Korea
基金
新加坡国家研究基金会;
关键词
translation; tokenizer; neural machine translation; natural language processing; deep learning;
D O I
10.12720/jait.11.4.228-232
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Mechanical translation using neural networks in natural language processing is making rapid progress. With the development of natural language processing model and tokenizer, accurate translation is becoming possible. In this paper, we will create a transformer model that shows high performance recently and compare the performance of English Korean according to tokenizer. We made a traditional neural network-based Neural Machine Translation (NMT) model using a transformer and compared the Korean translation results according to the tokenizer. The Byte Pair Encoding (BPE)-based Tokenizer showed a small vocabulary size and a fast learning speed, but due to the nature of Korean, the translation result was not good. The morphological analysis-based Tokenizer showed that the parallel corpus data is large and the vocabulary is large, the performance is higher regardless of the characteristics of the language.
引用
收藏
页码:228 / 232
页数:5
相关论文
共 21 条
  • [1] Detection of malicious network activity using the Feature-Tokenizer Transformer model
    Agilandeeswari, L.
    Chunduri, Atirath
    JOURNAL OF INFORMATION ASSURANCE AND SECURITY, 2023, 18 (06): : 192 - 202
  • [2] Design and Implementation of Consecutive Interpreting System Based on Transformer NMT Model
    Li, Lin-gen
    Li, Shuo
    Cui, Wan-qing
    Wei, Kai
    2018 INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATION AND NETWORK TECHNOLOGY (CCNT 2018), 2018, 291 : 1 - 8
  • [3] Comparison of transformer models for performance on domain specific texts: A systematic evaluation of intrinsic model performance
    Valiyev, Giavid
    Eles, Philip
    Kok, Arvid
    D'Ercole, Riccardo
    2023 INTERNATIONAL CONFERENCE ON MILITARY COMMUNICATIONS AND INFORMATION SYSTEMS, ICMCIS, 2023,
  • [4] Evaluating reservoir performance using a transformer based proxy model
    Zhang, Feng
    Nghiem, Long
    Chen, Zhangxin
    GEOENERGY SCIENCE AND ENGINEERING, 2023, 226
  • [5] A Novel Framework for Robust Bearing Fault Diagnosis: Preprocessing, Model Selection, and Performance Evaluation
    Althobiani, Faisal
    IEEE ACCESS, 2024, 12 : 59018 - 59036
  • [6] Performance Evaluation of Deep Learning Based Object Detection Model on Image Infrared Preprocessing
    Kang, Byung-jin
    Bae, Jaehyun
    Kim, Daehyeon
    Baek, Kyounghoon
    JOURNAL OF THE KOREAN SOCIETY FOR AERONAUTICAL AND SPACE SCIENCES, 2024, 52 (12) : 1055 - 1061
  • [7] Performance Evaluation of Deep Learning Model according to the Ratio of Cultivation Area in Training Data
    Seong, Seonkyeong
    Choi, Jaewan
    KOREAN JOURNAL OF REMOTE SENSING, 2022, 38 (06) : 1007 - 1014
  • [8] A Study on Performance Enhancement by Integrating Neural Topic Attention with Transformer-Based Language Model
    Um, Taehum
    Kim, Namhyoung
    APPLIED SCIENCES-BASEL, 2024, 14 (17):
  • [9] Analysis of sentiment in tweets addressed to a single domain-specific Twitter account: Comparison of model performance and explainability of predictions
    Fiok, Krzysztof
    Karwowski, Waldemar
    Gutierrez, Edgar
    Wilamowski, Maciej
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 186
  • [10] Enhancing squat movement classification performance with a gated long-short term memory with transformer network model
    Hu, Xinyao
    Zhang, Wenyue
    Ou, Haopeng
    Mo, Shiwei
    Liang, Fenjie
    Liu, Junshi
    Zhao, Zhong
    Qu, Xingda
    SPORTS BIOMECHANICS, 2024,