Sequence-to-Sequence Models and Their Evaluation for Spoken Language Normalization of Slovenian

被引:0
作者
Maucec, Mirjam Sepesy [1 ]
Verdonik, Darinka [1 ]
Donaj, Gregor [1 ]
机构
[1] Univ Maribor, Fac Elect Engn & Comp Sci, SI-2000 Maribor, Slovenia
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 20期
关键词
low-resource language; spoken language; normalization; character unit; subword unit; statistical model; long short-term memory; transformer; error analysis;
D O I
10.3390/app14209515
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Sequence-to-sequence models have been applied to many challenging problems, including those in text and speech technologies. Normalization is one of them. It refers to transforming non-standard language forms into their standard counterparts. Non-standard language forms come from different written and spoken sources. This paper deals with one such source, namely speech from the less-resourced highly inflected Slovenian language. The paper explores speech corpora recently collected in public and private environments. We analyze the efficiencies of three sequence-to-sequence models for automatic normalization from literal transcriptions to standard forms. Experiments were performed using words, subwords, and characters as basic units for normalization. In the article, we demonstrate that the superiority of the approach is linked to the choice of the basic modeling unit. Statistical models prefer words, while neural network-based models prefer characters. The experimental results show that the best results are obtained with neural architectures based on characters. Long short-term memory and transformer architectures gave comparable results. We also present a novel analysis tool, which we use for in-depth error analysis of results obtained by character-based models. This analysis showed that systems with similar overall results can differ in the performance for different types of errors. Errors obtained with the transformer architecture are easier to correct in the post-editing process. This is an important insight, as creating speech corpora is a time-consuming and costly process. The analysis tool also incorporates two statistical significance tests: approximate randomization and bootstrap resampling. Both statistical tests confirm the improved results of neural network-based models compared to statistical ones.
引用
收藏
页数:24
相关论文
共 52 条
  • [1] Abdul-Mageed Muhammad, 2023, P ARABICNLP 2023, P600
  • [2] Abe K., 2018, P 32 PAC AS C LANG I
  • [3] [Anonymous], 2011, P IEEE AUT SPEECH RE
  • [4] [Anonymous], 2018, P 27 INT C COMP LING
  • [5] [Anonymous], 2003, 2003 C N AM CHAPTER, DOI [10.3115/1073445.1073462, DOI 10.3115/1073445.1073462]
  • [6] [Anonymous], 2016, P 13 C NAT LANG PROC
  • [7] Babhulgaonkar AR, 2017, 2017 1ST INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND INFORMATION MANAGEMENT (ICISIM), P62, DOI 10.1109/ICISIM.2017.8122149
  • [8] KsponSpeech: Korean Spontaneous Speech Corpus for Automatic Speech Recognition
    Bang, Jeong-Uk
    Yun, Seung
    Kim, Seung-Hi
    Choi, Mu-Yeol
    Lee, Min-Kyu
    Kim, Yeo-Jeong
    Kim, Dong-Hyun
    Park, Jun
    Lee, Young-Jik
    Kim, Sang-Hun
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (19): : 1 - 17
  • [9] Barnes J., 2021, P 23 NORD C COMP LIN, P445
  • [10] Baron Alistair, 2008, P POSTGRADUATE C COR