COMBINING TEXT CLASSIFIERS AND HIDDEN MARKOV MODELS FOR INFORMATION EXTRACTION

被引:5
作者
Barros, Flavia A. [1 ]
Silva, Eduardo F. A. [1 ]
Prudencio, Ricardo B. C. [1 ]
Filho, Valmir M. [1 ]
Nascimento, Andre C. A. [1 ]
机构
[1] Univ Fed Pernambuco, Ctr Informat, BR-50732970 Recife, PE, Brazil
关键词
Information extraction; text classifiers; HMM;
D O I
10.1142/S0218213009000147
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a hybrid machine learning approach to Information Extraction by combining conventional text classification techniques and Hidden Markov Models (HMM). A text classifier generates a (locally optimal) initial output, which is refined by an HMM, providing a globally optimal classification. The proposed approach was evaluated in two case studies and the experiments revealed a consistent gain in performance through the use of the HMM. In the first case study, the implemented prototype was used to extract information from bibliographic references, reaching a precision rate of 87.48% in a test set with 3000 references. In the second case study, the prototype extracted information from author affiliations, reaching a precision rate of 90.27% in a test set with 300 affiliations.
引用
收藏
页码:311 / 329
页数:19
相关论文
共 44 条
[1]  
AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759
[2]  
[Anonymous], P 15 INT C MACH LEAR
[3]  
APPELT DE, 1999, INT JOINT C ART INT
[4]  
Baeza-Yates R., 1999, Modern information retrieval
[5]  
BERARDI M, 2005, ICDAR WORKSH NEUR NE
[6]  
BORKAR VR, 2001, P ACM SIGMOD INT C M, P175
[7]  
BOUCKAERT RR, 2002, LOW LEVEL INFORM EXT
[8]  
Califf ME, 1999, SIXTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-99)/ELEVENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE (IAAI-99), P328
[9]  
CALLAN J, 2002, P CIKM02
[10]  
Chang CH, 2006, IEEE T KNOWL DATA EN, V18, P1411, DOI 10.1109/TKDE.2006.152