Simultaneous feature and parameter selection using multiobjective optimization: application to named entity recognition

被引:0
作者
Asif Ekbal
Sriparna Saha
机构
[1] Indian Institute of Technology,Department of Computer Science and Engineering
来源
International Journal of Machine Learning and Cybernetics | 2016年 / 7卷
关键词
Named entity recognition (NER); Feature selection; Parameter selection; Machine learning; Multiobjective optimization;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, we propose an efficient algorithm based on the concept of multiobjective optimization (MOO) for performing feature selection and parameter optimization of any machine learning technique. Feature and parameter combinations have significant effect to the accuracy of the classifier. We perform feature selection and parameter optimization for four different classifiers, namely conditional random field, support vector machine, memory based learner and maximum entropy. The proposed algorithms are evaluated for solving the problems of named entity recognition, an important component in many text processing applications. Currently we experiment with four different languages, namely Bengali, Hindi, Telugu and English. At first the proposed MOO based technique is used to determine the appropriate features and parameters. For each of the classifiers, the algorithm produces a set of solutions on the final Pareto optimal front. Each solution represents a classifier with a particular feature and parameter combination. All these solutions are thereafter combined using a MOO based classifier ensemble technique. Evaluation results show that the proposed approach attains the F-measure (harmonic mean of recall and precision) values of 90.48, 90.44, 78.71 and 88.68 % for Bengali, Hindi, Telugu and English, respectively. We also show that for all the experimental settings the proposed feature and parameter optimization technique performs reasonably better than the baseline systems, developed with random feature subsets. Comparisons with the existing works also show the efficacy of our proposed algorithm.
引用
收藏
页码:597 / 611
页数:14
相关论文
共 49 条
  • [11] Dong CR(2004)Hybrid genetic algorithms for feature selection IEEE Trans Pattern Anal Mach Intell 26 1424-1437
  • [12] Wang XZ(2012)Multiobjective optimization for classifier ensemble and feature selection: an application to named entity recognition IJDAR 15 143-166
  • [13] Dong LC(2013)Full length article: Simulated annealing based classifier ensemble techniques: application to part of speech tagging Inf Fusion 14 288-300
  • [14] Yan JH(2002)A fast and elitist multiobjective genetic algorithm: NSGA-II IEEE Trans Evolut Comput 6 181-197
  • [15] Etzioni O(1991)Instance-based learning algorithms Mach Learn 6 37-66
  • [16] Cafarrella M(1972)Generalized iterative scaling for log-linear models Ann Math Stat 43 1470-1480
  • [17] Downey D(2008)A web-based Bengali news corpus for named entity recognition Lang Resour Eval J 42 173-182
  • [18] Popescu AM(undefined)undefined undefined undefined undefined-undefined
  • [19] Shaked T(undefined)undefined undefined undefined undefined-undefined
  • [20] Soderland S(undefined)undefined undefined undefined undefined-undefined