Information Extraction from Semi-structured Resources: A Two-Phase Finite State Transducers Approach

被引:0
作者
Pajic, Vesna [1 ]
Lazetic, Gordana Pavlovic [2 ]
Pajic, Milos [1 ]
机构
[1] Univ Belgrade, Fac Agr, Nemanjina 6, Belgrade 11080, Serbia
[2] Univ Belgrade, Fac Math, Belgrade 11000, Serbia
来源
IMPLEMENTATION AND APPLICATION OF AUTOMATA | 2011年 / 6807卷
关键词
information extraction; finite state transducer; semi-structured resource; linguistic resource; bioinformatics; genome;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The paper presents a new method for extracting information from semi-structured resources, based on finite state transducers. The method has two clearly distinguished phases. The first phase - pre-processing phase strongly relies upon the analysis of the document structure and it is used for locating records of data in the text. The second phase is based on the finite state transducers created for extracting information. The transducers can be modified so that preferred efficiency is achieved and can be reused for extracting information from other pre-processed documents. We conclude that even untagged text can be treated as a semi-structured one, providing its structure can be successfully pre-processed. As a result, we extracted data from free form encyclopedia text and created a fully structured database with genotype and phenotype characteristics of the organisms.
引用
收藏
页码:282 / +
页数:2
相关论文
共 17 条
  • [1] Aho A. V., 1974, The design and analysis of computer algorithms
  • [2] [Anonymous], UNITEX 1 2 USER MANU
  • [3] [Anonymous], 2000, Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition
  • [4] Carlson A, 2008, LECT NOTES ARTIF INT, V5211, P195, DOI 10.1007/978-3-540-87479-9_31
  • [5] Inference of finite-state transducers from regular languages
    Casacuberta, F
    Vidal, E
    Picó, D
    [J]. PATTERN RECOGNITION, 2005, 38 (09) : 1431 - 1443
  • [6] Feng D., 2007, P 2007 JOINT C EMP M, P837
  • [7] Finite-state transducer cascades to extract named entities in texts
    Friburger, N
    Maurel, D
    [J]. THEORETICAL COMPUTER SCIENCE, 2004, 313 (01) : 93 - 104
  • [8] Garrity G.M., 2005, BERGEYS MANUAL SYSTE, V2
  • [9] Gross M., 1987, P LITP SPRING SCH TH
  • [10] Hobbs JR, 1997, LANG SPEECH & COMMUN, P383