Training MEMM with PSO: A tool for part-of-speech tagging

被引:0
作者
La, Lei [1 ]
Guo, Qiao [1 ]
Cao, Qimin [1 ]
机构
[1] School of Automation, Beijing Institute of Technology, Beijing
关键词
Dynamic global mutation probability; Maximum entropy markov models; Part-of-speech; Particle swarm optimization; Text mining;
D O I
10.4304/jsw.7.11.2511-2517
中图分类号
学科分类号
摘要
Maximum Entropy Markov Models (MEMM) can avoid the assumption of independence in traditional Hidden Markov Models (HMM), and thus take advantage of context information in most text mining tasks. Because the convergence rate of the classic generalized iterative scaling (GIS) algorithm is too low to be tolerated, researchers proposed a lot of improved methods such as IIS, SCGIS and LBFGS for parameters training in MEMM. However these methods sometimes do not satisfy task requirements in efficiency and robustness. This article modifies the traditional Particle Swarm Optimization (PSO) algorithm by using dynamic global mutation probability (DGMP) to solve the local optimum and infinite loops problems and use the modified PSO in MEMM for estimating the parameters. We introduce the MEMM trained by modified PSO into Chinese Part-of-Speech (POS) tagging, analysis the experimental results and find it has higher convergence rate and accuracy than traditional MEMM. © 2012 ACADEMY PUBLISHER.
引用
收藏
页码:2511 / 2517
页数:6
相关论文
共 21 条
[1]  
Hewlett D., Cohen P., Word segmentation as general chunking, Psychocomputational Models of Language Acquisition Workshop (PsychoCompLA), (2009)
[2]  
Lee Y.K., Haghighi A., Barzilay R., Simple type-level unsupervised POS tagging, Proceedings of the 2010 Conference On Empirical Methods In Natural Language Processing, (2010)
[3]  
Lafferty J., McCallum A., Pereira F., Conditional random fields: Probabilistic models for segmenting and labeling sequence data, Proceedings of ICML, (2001)
[4]  
Dong T., Wenqian Shang and Haibin Zhu, An Improved Algorithm of Bayesian Text Categorization, Journal of Software, 6, (2011)
[5]  
Li, Punctuation as implicit annotations for Chinese word segmentation, Computational Linguistics, (2009)
[6]  
Cohen P., Adams N., Heeringa B., Voting Experts: An Unsupervised Algorithm for Segmenting Sequences, Intelligent Data Analysis, 11, (2007)
[7]  
Gutub A.A.-A., Al-Haidari F., Al-Kahsah K.M., Hamodi J., e-Text Watermarking: Utilizing 'Kashida' Extensions in Arabic Language Electronic Writing, Journal of Emerging Technologies In Web Intelligence, 2, (2010)
[8]  
Wang J., Shao W., Zhu F., Biological Terms Boundary Identification by Maximum Entropy Model, 6th IEEE Conference On Industrial Electronics and Applications, (2011)
[9]  
Wang H.-K., Kuo J., Maximum entropy modeling for speech recognition, 2004 International Conference On Chinese Spoken Language Processing, (2004)
[10]  
Li R., Li-Ying L., He-Fang F., Application Study of Hidden Markov Model and Maximum Entropy in Text Information Extraction, International Conference On Artificial Intelligence and Computational Intelligence, (2009)