Corpus of Carbonate Platforms with Lexical Annotations for Named Entity Recognition

被引:2
|
作者
Hu, Zhichen [1 ]
Ren, Huali [2 ]
Jiang, Jielin [1 ]
Cui, Yan [4 ]
Hu, Xiumian [3 ]
Xu, Xiaolong [1 ]
机构
[1] Nanjing Univ Informat Sci & Technol, Sch Comp & Software, Nanjing 210044, Peoples R China
[2] Guangzhou Univ, Inst Artificial Intelligence & Blockchain, Guangzhou 515021, Peoples R China
[3] Nanjing Univ, Sch Earth Sci & Engn, Nanjing 210023, Peoples R China
[4] Nanjing Normal Univ Special Educ, Coll Math & Informat Sci, Nanjing 210023, Peoples R China
来源
CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES | 2023年 / 135卷 / 01期
基金
中国国家自然科学基金; 美国国家科学基金会;
关键词
Named entity recognition; carbonate platform corpus; entity extraction; english literature detection;
D O I
10.32604/cmes.2022.022268
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
An obviously challenging problem in named entity recognition is the construction of the kind data set of entities. Although some research has been conducted on entity database construction, the majority of them are directed at Wikipedia or the minority at structured entities such as people, locations and organizational nouns in the news. This paper focuses on the identification of scientific entities in carbonate platforms in English literature, using the example of carbonate platforms in sedimentology. Firstly, based on the fact that the reasons for writing literature in key disciplines are likely to be provided by multidisciplinary experts, this paper designs a literature content extraction method that allows dealing with complex text structures. Secondly, based on the literature extraction content, we formalize the entity extraction task (lexicon and lexical-based entity extraction) for entity extraction. Furthermore, for testing the accuracy of entity extraction, three currently popular recognition methods are chosen to perform entity detection in this paper. Experiments show that the entity data set provided by the lexicon and lexical-based entity extraction method is of significant assistance for the named entity recognition task. This study presents a pilot study of entity extraction, which involves the use of a complex structure and specialized literature on carbonate platforms in English.
引用
收藏
页码:91 / 108
页数:18
相关论文
共 50 条
  • [1] Better Modeling of Incomplete Annotations for Named Entity Recognition
    Jie, Zhanming
    Xie, Pengjun
    Lu, Wei
    Ding, Ruixue
    Li, Linlin
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 729 - 734
  • [2] Uzbek news corpus for named entity recognition
    Yusufu, Aizihaierjiang
    Aziz, Kamran
    Yusufu, Aizierguli
    Ainiwaer, Abidan
    Li, Fei
    Ji, Donghong
    LANGUAGE RESOURCES AND EVALUATION, 2024,
  • [3] A Twitter Corpus for Named Entity Recognition in Turkish
    Carik, Buse
    Yeniterzi, Reyyan
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 4546 - 4551
  • [4] Thai Nested Named Entity Recognition Corpus
    Buaphet, Weerayut
    Udomcharoenchaikit, Can
    Limkonchotiwat, Peerat
    Rutherford, Attapol T.
    Nutanong, Sarana
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 1473 - 1486
  • [5] A Finnish news corpus for named entity recognition
    Teemu Ruokolainen
    Pekka Kauppinen
    Miikka Silfverberg
    Krister Lindén
    Language Resources and Evaluation, 2020, 54 : 247 - 272
  • [6] A Finnish news corpus for named entity recognition
    Ruokolainen, Teemu
    Kauppinen, Pekka
    Silfverberg, Miikka
    Linden, Krister
    LANGUAGE RESOURCES AND EVALUATION, 2020, 54 (01) : 247 - 272
  • [7] IMPROVING CHINESE NAMED ENTITY RECOGNITION WITH LEXICAL INFORMATION
    Fu, Guo-Hong
    PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, : 3487 - 3491
  • [8] Construction of a Geological Fault Corpus and Named Entity Recognition
    Wang, Huainuo
    Niu, Ruiqing
    Han, Yongyao
    Deng, Qinglu
    APPLIED SCIENCES-BASEL, 2025, 15 (05):
  • [9] An Open Corpus for Named Entity Recognition in Historic Newspapers
    Neudecker, Clemens
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 4348 - 4352
  • [10] MTNER: A Corpus for Mongolian Tourism Named Entity Recognition
    Cheng, Xiao
    Wang, Weihua
    Bao, Feilong
    Gao, Guanglai
    MACHINE TRANSLATION, CCMT 2020, 2020, 1328 : 11 - 23