Multilingual Sequence-to-Sequence Models for Hebrew NLP

被引：0

作者：

Eyal, Matan ^{[1
]}

Noga, Hila ^{[1
]}

Aharoni, Roee ^{[1
]}

Szpektor, Idan ^{[1
]}

Tsarfaty, Reut ^{[1
]}

机构：

[1] Google Res, Mountain View, CA 94043 USA

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023) | 2023年

关键词：

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Recent work attributes progress in NLP to large language models (LMs) with increased model size and large quantities of pretraining data. Despite this, current state-of-the-art LMs for Hebrew are both under-parameterized and under-trained compared to LMs in other languages. Additionally, previous work on pretrained Hebrew LMs focused on encoderonly models. While the encoder-only architecture is beneficial for classification tasks, it does not cater well for sub-word prediction tasks, such as Named Entity Recognition, when considering the morphologically rich nature of Hebrew. In this paper we argue that sequence-to-sequence generative architectures are more suitable for large LMs in morphologically rich languages (MRLs) such as Hebrew. We demonstrate this by casting tasks in the Hebrew NLP pipeline as text-to-text tasks, for which we can leverage powerful multilingual, pretrained sequence-to-sequence models as mT5, eliminating the need for a separate, specialized, morpheme-based, decoder. Using this approach, our experiments show substantial improvements over previously published results on all existing Hebrew NLP benchmarks. These results suggest that multilingual sequence-to-sequence models present a promising building block for NLP for MRLs.

引用

页码：7700 / 7708

页数：9

共 50 条

[21] Neural Abstractive Text Summarization with Sequence-to-Sequence Models [J].

Shi, Tian ;

Keneshloo, Yaser ;

Ramakrishnan, Naren ;

Reddy, Chandan K. .

ACM/IMS Transactions on Data Science, 2021, 2 (01)

[22] Predicting the Mumble of Wireless Channel with Sequence-to-Sequence Models [J].

Huangfu, Yourui ;

Wang, Jian ;

Li, Rong ;

Xu, Chen ;

Wang, Xianbin ;

Zhang, Huazi ;

Wang, Jun .

2019 IEEE 30TH ANNUAL INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR AND MOBILE RADIO COMMUNICATIONS (PIMRC), 2019, :1043-1049

[23] Analyzing Adversarial Attacks on Sequence-to-Sequence Relevance Models [J].

Parry, Andrew ;

Froebe, Maik ;

MacAvaney, Sean ;

Potthast, Martin ;

Hagen, Matthias .

ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT II, 2024, 14609 :286-302

[24] SUPERVISED ATTENTION IN SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION [J].

Yang, Gene-Ping ;

Tang, Hao .

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :7222-7226

[25] ACOUSTIC-TO-WORD RECOGNITION WITH SEQUENCE-TO-SEQUENCE MODELS [J].

Palaskar, Shruti ;

Metze, Florian .

2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, :397-404

[26] Persian Keyphrase Generation Using Sequence-to-sequence Models [J].

Doostmohammadi, Ehsan ;

Bokaei, Mohammad Hadi ;

Sameti, Hossein .

2019 27TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE 2019), 2019, :2010-2015

[27] Leveraging Sequence-to-Sequence Models for Kannada Abstractive Summarization [J].

Dakshayani Ijeri ;

Pushpa B. Patil .

SN Computer Science, 6 (5)

[28] Fine-Tuning Self-Supervised Multilingual Sequence-To-Sequence Models for Extremely Low-Resource NMT [J].

Thillainathan, Sarubi ;

Ranathunga, Surangika ;

Jayasena, Sanath .

MORATUWA ENGINEERING RESEARCH CONFERENCE (MERCON 2021) / 7TH INTERNATIONAL MULTIDISCIPLINARY ENGINEERING RESEARCH CONFERENCE, 2021, :432-437

[29] Sequence-to-Sequence Models Can Directly Translate Foreign Speech [J].

Weiss, Ron J. ;

Chorowski, Jan ;

Jaitly, Navdeep ;

Wu, Yonghui ;

Chen, Zhifeng .

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :2625-2629

[30] Sequence-to-Sequence Models and Their Evaluation for Spoken Language Normalization of Slovenian [J].

Maucec, Mirjam Sepesy ;

Verdonik, Darinka ;

Donaj, Gregor .

APPLIED SCIENCES-BASEL, 2024, 14 (20)

← 1 2 3 4 5 →