Hybrid Approach Text Generation for Low-Resource Language

被引：0

作者：

Rakhimova, Diana ^{[1
,2
]}

Adali, Esref ^{[3
]}

Karibayeva, Aidana ^{[1
,2
]}

机构：

[1] Al Farabi Kazakh Natl Univ, Alma Ata 050040, Kazakhstan

[2] Inst Informat & Computat Technol, Alma Ata 050010, Kazakhstan

[3] Istanbul Tech Univ, TR-34485 Istanbul, Turkiye

来源：

ADVANCES IN COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2024, PART I | 2024年 / 2165卷

关键词：

Text generation; low recourse language; Kazakh language; Turkish languages; TF-IDF; RNN; LSTM;

D O I：

10.1007/978-3-031-70248-8_20

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Text generation is an important tool used by many companies in various fields such as chatbots, search engines, and question and answer systems, and is a hot trend in artificial intelligence. Generating texts and sentences can be used for both educational and entertainment purposes. Generating texts and sentences for children in natural language processing plays an important role in children's development. This helps them improve their reading, comprehension and communication skills in the language. Currently, many languages of the world belong to the class with the low resources. The field of text generation for low-resource languages is still at an early stage of development and there are many problems that need to be solved. One of the main problems is the lack of big data and linguistic resources in the public domain, which makes it difficult to effectively apply modern machine learning methods. As well as the lack of modern methods and tools for analyzing the processing of these languages. This article presents a hybrid approach to text generation on the example of the Turkish and Kazakh languages. These languages belong to a large group of Turkic languages along with Kyrgyz, Tatar, Uzbek and other languages. An approach based on neural learning using the LSTM model is proposed and implemented, considering the structural and semantic properties of the language. Training and testing are carried out on the assembled corpus (for various types of text genres). The quality of text generation was assessed based on the BLEU metric.

引用

页码：256 / 268

页数：13

共 26 条

[1] [Anonymous], 2021, WIKIPEDIA
[2] Birkett A., The 8 best AI text generators to 10X content production
[3] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[4] Dogan E, 2018, 2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP)
[5] Dotton Z., A Grammar of Kazakh
[6] github, NLP-KazNU
[7] Joshi Pratik, 2020, P 58 ANN M ASS COMP, P6282, DOI DOI 10.18653/V1/2020.ACL-MAIN.560
[8] The neural machine translation models for the low-resource Kazakh-English language pair
Karyukin, Vladislav
Rakhimova, Diana
Karibayeva, Aidana
Turganbayeva, Aliya
Turarbek, Asem
[J]. PEERJ COMPUTER SCIENCE, 2023, 9
[9] Data-Driven Morphological Analysis and Disambiguation for Kazakh
Makhambetov, Olzhas
Makazhanov, Aibek
Sabyrgaliyev, Islam
Yessenbayev, Zhandos
[J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT I, 2015, 9041 : 151 - 163
[10] Mussayeva D., 2021, J. Theor. Appl. Inf. Technol., V99

← 1 2 3 →