Named Entity Corpus Construction using Wikipedia and DBpedia Ontology

被引:0
|
作者
Hahm, Younggyun [1 ]
Park, Jungyeul [2 ]
Lim, Kyungtae [3 ]
Kim, Youngsik [3 ]
Hwang, Dosam [4 ]
Choi, Key-Sun [1 ,3 ]
机构
[1] Korea Adv Inst Sci & Technol, Div Web Sci & Technol, Taejon, South Korea
[2] Univ Rennes 1, IRISA, UMR 6074, Lannion, France
[3] Korea Adv Inst Sci & Technol, Dept Comp Sci, Taejon, South Korea
[4] Yeungnam Univ, Dept Comp Sci, Gyongsan, Gyeongsangbuk D, South Korea
来源
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2014年
关键词
Corpus; Named Entity Recognition; Linked Data;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
In this paper, we propose a novel method to automatically build a named entity corpus based on the DBpedia ontology. Since most of named entity recognition systems require time and effort consuming annotation tasks as training data. Work on NER has thus for been limited on certain languages like English that are resource-abundant in general. As an alternative, we suggest that the NE corpus generated by our proposed method, can be used as training data. Our approach introduces Wikipedia as a raw text and uses the DBpedia data set for named entity disambiguation. Our method is language-independent and easy to be applied to many different languages where Wikipedia and DBpedia are provided. Throughout the paper, we demonstrate that our NE corpus is of comparable quality even to the manually annotated NE corpus.
引用
收藏
页码:2565 / 2569
页数:5
相关论文
共 50 条
  • [41] Telugu named entity recognition using bert
    Gorla, SaiKiranmai
    Tangeda, Sai Sharan
    Neti, Lalita Bhanu Murthy
    Malapati, Aruna
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2022, 14 (02) : 127 - 140
  • [42] Telugu named entity recognition using bert
    SaiKiranmai Gorla
    Sai Sharan Tangeda
    Lalita Bhanu Murthy Neti
    Aruna Malapati
    International Journal of Data Science and Analytics, 2022, 14 : 127 - 140
  • [43] A French Corpus and Annotation Schema for Named Entity Recognition and Relation Extraction of Financial News
    Jabbari, Ali
    Sauvage, Olivier
    Zeine, Hamada
    Chergui, Hamza
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2293 - 2299
  • [44] Named Entity Tagging a Very Large Unbalanced Corpus: Training and Evaluating NE Classifiers
    Bingel, Joachim
    Haider, Thomas
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 2578 - 2583
  • [45] A comparison of sequential and combined approaches for named entity recognition in a corpus of handwritten medieval charters
    Boros, Emanuela
    Romero, Veronica
    Maarand, Martin
    Zenklova, Katerina
    Kreckova, Jitka
    Vidal, Enrique
    Stutzmann, Dominique
    Kermorvant, Christopher
    2020 17TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2020), 2020, : 79 - 84
  • [47] Protein Named Entity Classification with Probabilistic Features Derived from GENIA Corpus and MEDLINE
    Sumathipala, Sagara
    Yamada, Koichi
    Unehara, Muneyuki
    2014 JOINT 7TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 15TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2014, : 1257 - 1261
  • [48] M-CNER: A Corpus for Chinese Named Entity Recognition in Multi-Domains
    Lu, Qi
    Yang, YaoSheng
    Li, Zhenghua
    Chen, Wenliang
    Zhang, Min
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 4457 - 4461
  • [49] A Named Entity-Annotated Corpus of 19th Century Classical Commentaries
    Romanello, Matteo
    Najem-Meyer, Sven
    JOURNAL OF OPEN HUMANITIES DATA, 2024, 10
  • [50] A Corpus Study and Annotation Schema for Named Entity Recognition and Relation Extraction of Business Products
    Schoen, Saskia
    Mironova, Veselina
    Gabryszak, Aleksandra
    Hennig, Leonhard
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 4445 - 4451