A Language Independent Approach for Named Entity Recognition in Subject Headings

被引:0
作者
Freire, Nuno [1 ]
Borbinha, Jose [1 ]
Calado, Pavel [1 ]
机构
[1] Univ Tecn Lisboa, Inst Super Tecn, P-1049001 Lisbon, Portugal
来源
RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, TPDL 2011 | 2011年 / 6966卷
关键词
named entity recognition; subject headings; linked data; SKOS; machine learning;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
. Subject headings systems are tools for organization of knowledge that have been developed over the years by libraries. The SKOS Simple Knowledge Organization System has provided a practical way to represent subject headings systems using the Resource Description Framework, and several libraries have taken the initiative to make subject headings systems widely available as open linked data. Each individual subject heading describes a concept, however, in the majority of cases, one subject heading is actually a combination of several concepts, such as a topic bounded in geographical and temporal scopes. In these cases, the label of the concept actually carries several concepts which are not represented in structured form. Our work explores machine learning techniques to recognize the sub concepts represented in the labels of SKOS subject headings. This paper describes a language independent named entity recognition technique based on conditional random fields, a machine learning algorithm for sequence labelling. This technique was evaluated on a subset of the Library of Congress Subject Headings, where we measured the recognition of geographic concepts, topics, time periods and historical periods. Our technique achieved an overall F-1 score of 0.98.
引用
收藏
页码:52 / 61
页数:10
相关论文
共 22 条
  • [1] [Anonymous], 2010, UN TEXT SEGM
  • [2] [Anonymous], 2001, P INT C MACH LEARN
  • [3] [Anonymous], INT C MACH LEARN
  • [4] Bikel DM, 1997, P 5 C APPL NAT LANG, P194
  • [5] Durbin R., 1998, Biological sequence analysis: probabilistic models of proteins and nucleic acids
  • [6] Hoerman H.L., 2000, CATALOGING CLASSIFIC, V29, P31
  • [7] Isaac A., 2008, LIB REV, V57
  • [8] Lopes, 1999, WORK GROUP PRINC UND, P2
  • [9] McCallum Andrew Kachites, 2002, MALLET MACHINE LEARN
  • [10] Mikheev A., 1999, Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, P159, DOI DOI 10.3115/1034678.1034710