Text mining, names and security

被引:5
作者
Thompson, P [1 ]
机构
[1] Dartmouth Coll, Hanover, NH 03755 USA
关键词
co-reference; multiple hypothesis tracking; name matching; natural language processors;
D O I
10.4018/jdm.2005010104
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A Process Query System, a new approach to representing and querying multiple hypotheses, is proposed for cross-document co-reference and linking based on existing entity extraction, co-reference and database name-matching technologies. A crucial component of linking entities across documents is the ability to recognize when different name strings are potential references to the same entity. Given the extraordinary range of variation international names can take when rendered in the Roman alphabet, this is a daunting task. The extension of name variant matching to free text will add important text mining functionality for intelligence and security informatics' toolkits.
引用
收藏
页码:54 / 59
页数:6
相关论文
共 17 条
  • [1] BAGGA A, 1998, P 36 ANN M ASS COMP, P79
  • [2] Bagga Amit, 1998, P 1 INT C LANG RES E, P563
  • [3] BALUJA S, 1999, P PAC ASS COMP LING
  • [4] An algorithm that learns what's in a name
    Bikel, DM
    Schwartz, R
    Weischedel, RM
    [J]. MACHINE LEARNING, 1999, 34 (1-3) : 211 - 231
  • [5] Borthwick A., 1998, P 7 MESS UND C MUC 7
  • [6] Collins M, 2002, 40TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P489
  • [7] GRISHMAN R, 1999, P 16 INT C COMP LING
  • [8] HOLMES D, 2002, P 2002 IEEE INT C IN
  • [9] JIANG G, 2004, P DEF SEC S
  • [10] LILJENSTAM M, 2003, SIMULATING REALISTIC