Simultaneous feature and parameter selection using multiobjective optimization: application to named entity recognition

被引：0

作者：

Asif Ekbal

Sriparna Saha

机构：

[1] Indian Institute of Technology,Department of Computer Science and Engineering

来源：

International Journal of Machine Learning and Cybernetics | 2016年 / 7卷

关键词：

Named entity recognition (NER); Feature selection; Parameter selection; Machine learning; Multiobjective optimization;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In this paper, we propose an efficient algorithm based on the concept of multiobjective optimization (MOO) for performing feature selection and parameter optimization of any machine learning technique. Feature and parameter combinations have significant effect to the accuracy of the classifier. We perform feature selection and parameter optimization for four different classifiers, namely conditional random field, support vector machine, memory based learner and maximum entropy. The proposed algorithms are evaluated for solving the problems of named entity recognition, an important component in many text processing applications. Currently we experiment with four different languages, namely Bengali, Hindi, Telugu and English. At first the proposed MOO based technique is used to determine the appropriate features and parameters. For each of the classifiers, the algorithm produces a set of solutions on the final Pareto optimal front. Each solution represents a classifier with a particular feature and parameter combination. All these solutions are thereafter combined using a MOO based classifier ensemble technique. Evaluation results show that the proposed approach attains the F-measure (harmonic mean of recall and precision) values of 90.48, 90.44, 78.71 and 88.68 % for Bengali, Hindi, Telugu and English, respectively. We also show that for all the experimental settings the proposed feature and parameter optimization technique performs reasonably better than the baseline systems, developed with random feature subsets. Comparisons with the existing works also show the efficacy of our proposed algorithm.

引用

页码：597 / 611

页数：14

共 49 条

[1] Yao L(2011)Biomedical named entity recognition using generalized expectation criteria Int J Mach Learn Cybern 2 235-243
[2] Sun C(2002)GATE, a general architecture for text engineering Comput Humanit 36 223-254
[3] Wu Y(1999)An algorithm that learns what’s in a name Mach Learn 34 211-231
[4] Wang X(2009)Improving generalization of fuzzy if-then rules by maximizing fuzzy entropy IEEE Trans Fuzzy Syst 17 556-567
[5] Wang X(2012)Maximum ambiguity-based sample selection in fuzzy decision tree induction IEEE Trans Knowl Data Eng 24 1491-1505
[6] Cunningham H(2005)Unsupervised named entity extraction from the web: an experimental study Artif Intell 165 91-134
[7] Bikel DM(2009)A conditional random field approach for named entity recognition in Bengali and Hindi Linguist Issues Lang Technol (LiLT) 2 1-44
[8] Schwartz RL(2007)Named entity recognition and transliteration in Bengali Named Entities: Recognit Classif Use Spec Issue Lingvist Investig J 30 95-114
[9] Weischedel RM(2004)Rapid development of hindi named entity recognition using conditional random fields and feature induction ACM Trans Asian Lang Inf Process 2 290-294
[10] Wang XZ(2005)Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17 491-502

← 1 2 3 4 5 →