Multilingual Sequence-to-Sequence Models for Hebrew NLP

被引:0
作者
Eyal, Matan [1 ]
Noga, Hila [1 ]
Aharoni, Roee [1 ]
Szpektor, Idan [1 ]
Tsarfaty, Reut [1 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
来源
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023) | 2023年
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Recent work attributes progress in NLP to large language models (LMs) with increased model size and large quantities of pretraining data. Despite this, current state-of-the-art LMs for Hebrew are both under-parameterized and under-trained compared to LMs in other languages. Additionally, previous work on pretrained Hebrew LMs focused on encoderonly models. While the encoder-only architecture is beneficial for classification tasks, it does not cater well for sub-word prediction tasks, such as Named Entity Recognition, when considering the morphologically rich nature of Hebrew. In this paper we argue that sequence-to-sequence generative architectures are more suitable for large LMs in morphologically rich languages (MRLs) such as Hebrew. We demonstrate this by casting tasks in the Hebrew NLP pipeline as text-to-text tasks, for which we can leverage powerful multilingual, pretrained sequence-to-sequence models as mT5, eliminating the need for a separate, specialized, morpheme-based, decoder. Using this approach, our experiments show substantial improvements over previously published results on all existing Hebrew NLP benchmarks. These results suggest that multilingual sequence-to-sequence models present a promising building block for NLP for MRLs.
引用
收藏
页码:7700 / 7708
页数:9
相关论文
共 50 条
[21]   Neural Abstractive Text Summarization with Sequence-to-Sequence Models [J].
Shi, Tian ;
Keneshloo, Yaser ;
Ramakrishnan, Naren ;
Reddy, Chandan K. .
ACM/IMS Transactions on Data Science, 2021, 2 (01)
[22]   Predicting the Mumble of Wireless Channel with Sequence-to-Sequence Models [J].
Huangfu, Yourui ;
Wang, Jian ;
Li, Rong ;
Xu, Chen ;
Wang, Xianbin ;
Zhang, Huazi ;
Wang, Jun .
2019 IEEE 30TH ANNUAL INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR AND MOBILE RADIO COMMUNICATIONS (PIMRC), 2019, :1043-1049
[23]   Analyzing Adversarial Attacks on Sequence-to-Sequence Relevance Models [J].
Parry, Andrew ;
Froebe, Maik ;
MacAvaney, Sean ;
Potthast, Martin ;
Hagen, Matthias .
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT II, 2024, 14609 :286-302
[24]   SUPERVISED ATTENTION IN SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION [J].
Yang, Gene-Ping ;
Tang, Hao .
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :7222-7226
[25]   ACOUSTIC-TO-WORD RECOGNITION WITH SEQUENCE-TO-SEQUENCE MODELS [J].
Palaskar, Shruti ;
Metze, Florian .
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, :397-404
[26]   Persian Keyphrase Generation Using Sequence-to-sequence Models [J].
Doostmohammadi, Ehsan ;
Bokaei, Mohammad Hadi ;
Sameti, Hossein .
2019 27TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE 2019), 2019, :2010-2015
[27]   Leveraging Sequence-to-Sequence Models for Kannada Abstractive Summarization [J].
Dakshayani Ijeri ;
Pushpa B. Patil .
SN Computer Science, 6 (5)
[28]   Fine-Tuning Self-Supervised Multilingual Sequence-To-Sequence Models for Extremely Low-Resource NMT [J].
Thillainathan, Sarubi ;
Ranathunga, Surangika ;
Jayasena, Sanath .
MORATUWA ENGINEERING RESEARCH CONFERENCE (MERCON 2021) / 7TH INTERNATIONAL MULTIDISCIPLINARY ENGINEERING RESEARCH CONFERENCE, 2021, :432-437
[29]   Sequence-to-Sequence Models Can Directly Translate Foreign Speech [J].
Weiss, Ron J. ;
Chorowski, Jan ;
Jaitly, Navdeep ;
Wu, Yonghui ;
Chen, Zhifeng .
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :2625-2629
[30]   Sequence-to-Sequence Models and Their Evaluation for Spoken Language Normalization of Slovenian [J].
Maucec, Mirjam Sepesy ;
Verdonik, Darinka ;
Donaj, Gregor .
APPLIED SCIENCES-BASEL, 2024, 14 (20)