Towards a Standardized Dataset on Indonesian Named Entity Recognition

被引:0
|
作者
Khairunnisa, Siti Oryza [1 ]
Imankulova, Aizhan [1 ]
Komachi, Mamoru [1 ]
机构
[1] Tokyo Metropolitan Univ, 6-6 Asahigaoka, Hino, Tokyo 1910065, Japan
来源
AACL-IJCNLP 2020: THE 1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP | 2020年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, named entity recognition (NER) tasks in the Indonesian language have undergone extensive development. There are only a few corpora for Indonesian NER; hence, recent Indonesian NER studies have used diverse datasets. Although an open dataset is available, it includes only approximately 2,000 sentences and contains inconsistent annotations, thereby preventing accurate training of NER models without reliance on pre-trained models. Therefore, we re-annotated the dataset and compared the two annotations' performance using the Bidirectional Long Short-Term Memory and Conditional Random Field (BiLSTM-CRF) approach. Fixing the annotation yielded a more consistent result for the organization tag and improved the prediction score by a large margin. Moreover, to take full advantage of pre-trained models, we compared different feature embeddings to determine their impact on the NER task for the Indonesian language.
引用
收藏
页码:64 / 71
页数:8
相关论文
共 50 条
  • [1] Dataset Enhancement and Multilingual Transfer for Named Entity Recognition in the Indonesian Language
    Khairunnisa, Siti Oryza
    Chen, Zhousi
    Komachi, Mamoru
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (06)
  • [2] A Named Entity Recognition Dataset for Turkish
    Kucuk, Dilek
    Kucuk, Dogan
    Arici, Nursal
    2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 329 - 332
  • [3] Named Entity Recognition on Indonesian Microblog Messages
    Taufik, Natanael
    Wicaksono, Alfan F.
    Adriani, Mirna
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2016, : 358 - 361
  • [4] Transfer Learning for Indonesian Named Entity Recognition
    Kosasih, Joshua Aditya
    Khodra, Masayu Leylia
    2018 INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT INFORMATICS (SAIN), 2018, : 173 - 178
  • [5] ImNER Indonesian Medical Named Entity Recognition
    Suwarningsih, Wiwin
    Supriana, Iping
    Purwarianti, Ayu
    2014 2ND INTERNATIONAL CONFERENCE ON TECHNOLOGY, INFORMATICS, MANAGEMENT, ENGINEERING, AND ENVIRONMENT (TIME-E 2014), 2014, : 184 - 188
  • [6] KazNERD: Kazakh Named Entity Recognition Dataset
    Yeshpanov, Rustem
    Khassanov, Yerbolat
    Varol, Huseyin Atakan
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 417 - 426
  • [7] DroNER: Dataset for drone named entity recognition
    Silalahi, Swardiantara
    Ahmad, Tohari
    Studiawan, Hudan
    DATA IN BRIEF, 2023, 48
  • [8] Towards Bangla Named Entity Recognition
    Chowdhury, Shammur Absar
    Alam, Firoj
    Khan, Naira
    2018 21ST INTERNATIONAL CONFERENCE OF COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2018,
  • [9] Creating a Dataset for Named Entity Recognition in the Archaeology Domain
    Brandsen, Alex
    Verberne, Suzan
    Wansleeben, Milco
    Lambers, Karsten
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4573 - 4577
  • [10] HiNER: A Large Hindi Named Entity Recognition Dataset
    Murthy, Rudra
    Bhattacharjee, Pallab
    Sharnagat, Rahul
    Khatri, Jyotsana
    Kanojia, Diptesh
    Bhattacharyya, Pushpak
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 4467 - 4476