Segment-based Hidden Markov Models for Information Extraction

被引:0
作者
Gu, Zhenmei [1 ]
Cercone, Nick [1 ]
机构
[1] Univ Waterloo, David R Cheriton Sch Comp Sci, Waterloo, ON N2I 3G1, Canada
来源
COLING/ACL 2006, VOLS 1 AND 2, PROCEEDINGS OF THE CONFERENCE | 2006年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hidden Markov models (HMMs) are powerful statistical models that have found successful applications in Information Extraction (IE). In current approaches to applying HMMs to IE, an HMM is used to model text at the document level. This modelling might cause undesired redundancy in extraction in the sense that more than one filler is identified and extracted. We propose to use HMMs to model text at the segment level, in which the extraction process consists of two steps: a segment retrieval step followed by an extraction step. In order to retrieve extraction-relevant segments from documents, we introduce a method to use HMMs to model and retrieve segments. Our experimental results show that the resulting segment HMM IE system not only achieves near zero extraction redundancy, but also has better overall extraction performance than traditional document HMM IE systems.
引用
收藏
页码:481 / 488
页数:8
相关论文
共 11 条
  • [1] [Anonymous], 2003, P 19 C UNC ART INT
  • [2] Bikel D.M., 1997, Proceedings of the fifth conference on Applied natural language processing. Association for Computational Linguistics, P194
  • [3] FREITAG D, 1999, P AAAI 99 WORKSH MAC
  • [4] Gale W., 1995, Quantitative Linguistics, V2, P217
  • [5] GU Z, 2006, P 2006 IEEE INT C FU
  • [6] Jelinek F., 1980, Pattern Recognition in Practice. Proceedings of an International Workshop, P381
  • [7] Leek T.R., 1997, THESIS UC SAN DIEGO
  • [8] MCCALLUM A, 2000, P ICML 2000
  • [9] PENG F, 2004, P HUM LANG TECHN C N
  • [10] PESHKIN L, 2003, P 18 INT JOINT C ART