A Part-of-Speech Tagging Algorithm for Essay Written by Chinese English Learner

被引:0
作者
Tan Y.-M. [1 ]
Yang L. [1 ]
Hu D. [1 ]
机构
[1] Intelligence Science and Technology Center, Beijing University of Posts and Telecommunications, Beijing
来源
| 1600年 / Beijing University of Posts and Telecommunications卷 / 40期
关键词
Chinese English learner; Essays; Part-of-speech tagging; Word vector;
D O I
10.13190/j.jbupt.2017.02.003
中图分类号
学科分类号
摘要
A tagging algorithm about two layers part-of-speech base on word embedding was proposed. Only a few artificial features are needed in this algorithm, most features are replaced by word embedding and tagging vector that is got in the first layer. In addition, the tag set is divided into two categories, which are the tag sets of different layers. The ones which are easily to be tagged are tagged firstly in the first layer. Those tags which are hardly to be tagged as noun and verb are tagged in the second layer. Using this algorithm, the accuracy of part-of-speech tagging of essays written by Chinese English learner is improved from 95.23% to 95.63%, which outperforms the state-of-art word results of part-of-speech tagging of essays written by Chinese English learner based on vector based on word embedding. © 2017, Editorial Department of Journal of Beijing University of Posts and Telecommunications. All right reserved.
引用
收藏
页码:16 / 20
页数:4
相关论文
共 12 条
  • [1] Toutanova K., Manning C.D., Enriching the knowledge sources used in a maximum entropy part-of-speech tagger, Joint Sigdat Conference on Empirical Methods in Natural Language Processing and Very Large Corpora: Held in Conjunction with the, Meeting of the Association for Computational Linguistics, pp. 63-70, (2000)
  • [2] Marquez L., Gimenez J., A general pos tagger generator based on support vector machines, JMLR, 5, pp. 1253-1286, (2004)
  • [3] Shen L., Satta G., Joshi A., Guided learning for bidirectional sequence classification, Meeting of the Association for Computational Linguistics, pp. 760-767, (2007)
  • [4] Owoputi O., O'Connor B., Dyer C., Et al., Improved part-of-speech tagging for online conversational text with word clusters, Proceedings of NAACLHLT 2013, pp. 380-390, (2013)
  • [5] Li H., The common errors analysis of college englishwriting, Forum on Contemporary Education, 8, pp. 120-121, (2006)
  • [6] Bengio Y., Ducharme R., Vincent P., Et al., A neural probabilistic language model, Journal of Machine Learning Research, 3, 6, pp. 1137-1155, (2003)
  • [7] Andrei A., Katrin K., Factored neural language models, Proceedings of the Human Language Technology Conference of the NAACL, pp. 1-4, (2006)
  • [8] Collobert R., Weston J., Bottou L., Et al., Natural language processing (almost) from scratch, Journal of Machine Learning Research, 12, 1, pp. 2493-2537, (2011)
  • [9] Mikolov T., Chen K., Corrado G., Et al., Efficient estimation of word representations in vector space
  • [10] Santos C.N.D., Zadrozny B., Learning character-level representations for part-of-speech tagging, International Conference on Machine Learning, pp. 1818-1826, (2014)