Named entity recognition for the Kazakh language

被引:1
|
作者
Kozhirbayev, Z. M. [1 ]
Yessenbayev, Z. A. [1 ]
机构
[1] Private Inst, Natl Lab Astana Nur Sultan, Astana, Kazakhstan
关键词
named entity recognition; conditional random field; long-term short-term memory; word embeddings;
D O I
10.26577/JMMCS.2020.v107.i3.06
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Named Entity Recognition (NER) is considered one of the important tasks of natural language processing (NLP). This is a way of recognizing real world objects, such as geographical location, person's name, organization, etc., that are found in a sentence. There are several approaches based on manually created grammar rules and statistical models, such as machine learning and hybrid methods, to solve the problem of recognizing named entities. The aim of this work is to experiment with methods based on statistical approach and machine learning, and to check how they deal with agglutinative Kazakh language. This paper presents the recognition of named objects based on a machine learning approach called conditional random field (CRF) as a statistical method. We also use a hybrid approach combining a bidirectional neural network model with long-term short-term memory (LSTM) and a CRF model. This is a modern approach to the recognition of named objects. The cross-validated randomized search model shows an f1 score of 0.95. The hybrid LSTM-CRF model shows an f1 score of 0.88. The results look pretty good and it doesn't require any design specifics compared to the CRF model. For the experiments, a corpus (kazNER) was created for the NER task with such marks as a person's name, location, organization and others. The corpus consists of 29,629 sentences that contain at least one proper noun containing only part of speech tags.
引用
收藏
页码:57 / 66
页数:10
相关论文
共 50 条
  • [1] KazNERD: Kazakh Named Entity Recognition Dataset
    Yeshpanov, Rustem
    Khassanov, Yerbolat
    Varol, Huseyin Atakan
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 417 - 426
  • [2] Named Entity Recognition for Mongolian Language
    Munkhjargal, Zoljargal
    Bella, Gabor
    Chagnaa, Altangerel
    Giunchiglia, Fausto
    TEXT, SPEECH, AND DIALOGUE (TSD 2015), 2015, 9302 : 243 - 251
  • [3] Named Entity Recognition in Marathi Language
    Kale, Shrutika
    Govilkar, Sharvari
    INTERNATIONAL CONFERENCE ON INTELLIGENT DATA COMMUNICATION TECHNOLOGIES AND INTERNET OF THINGS, ICICI 2018, 2019, 26 : 371 - 377
  • [4] Named Entity Recognition for Nepali Language
    Singh, Oyesh Mann
    Padia, Ankur
    Joshi, Anupam
    2019 IEEE 5TH INTERNATIONAL CONFERENCE ON COLLABORATION AND INTERNET COMPUTING (CIC 2019), 2019, : 184 - 190
  • [5] Named Entity Recognition for Sinhala Language
    Dahanayaka, J. K.
    Weerasinghe, A. R.
    14TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER) 2014, 2014, : 215 - 220
  • [6] Named Entity Recognition for the Azerbaijani Language
    Akhundova, Natavan
    2021 IEEE 15TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT2021), 2021,
  • [7] Supervised Named Entity Recognition in Assamese language
    Talukdar, Gitimoni
    Borah, Pranjal Protim
    Baruah, Arup
    2014 INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), 2014, : 187 - 191
  • [8] Named Entity Recognition System for Sindhi Language
    Jumani, Awais Khan
    Memon, Mashooque Ahmed
    Khoso, Fida Hussain
    Sanjrani, Anwar Ali
    Soomro, Safeeullah
    EMERGING TECHNOLOGIES IN COMPUTING, ICETIC 2018, 2018, 200 : 237 - 246
  • [9] A LANGUAGE INDEPENDENT NAMED ENTITY RECOGNITION SYSTEM
    Gifu, Daniela
    Vasilache, Gabriela
    PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE 'LINQUISTIC RESOURCES AND TOOLS FOR PROCESSING THE ROMANIAN LANGUAGE', 2014, 2014, : 181 - 188
  • [10] A Named Entity Recognition System for the Marathi Language
    Vaishali, P. Kadam
    Mahender, Namrata
    JOURNAL OF ADVANCED APPLIED SCIENTIFIC RESEARCH, 2024, 6 (03): : 229 - 243