Multi-input Fusion Spelling Error Correction Model Based on Contrast Optimization

被引:0
|
作者
Wu Y. [1 ,2 ,3 ]
Huang R. [1 ,2 ,3 ]
Bai R. [1 ,2 ,3 ]
Cao J. [1 ,2 ,3 ]
Zhao J. [1 ,2 ,3 ]
机构
[1] Engineering Research Center of Text Computing and Cognitive Intelligence, The Ministry of Education, Guizhou University, Guiyang
[2] State Key Laboratory of Public Big Data, Guizhou University, Guiyang
[3] College of Computer Science and Technology, Guizhou University, Guiyang
基金
中国国家自然科学基金;
关键词
Chinese Spelling Error Correction; Complementary Semantic Fusion; Contrastive Learning Optimization; Multi-input Semantic Learning;
D O I
10.16451/j.cnki.issn1003-6059.202401007
中图分类号
学科分类号
摘要
Chinese spelling correction is essential in text editing. Most of the existing Chinese spelling error correction models are single input models, and there are limitations in the semantic information and error correction results of the models. In this paper, a multi-input fusion spelling error correction method based on contrast optimization, MIF-SECCO, is proposed. MIF-SECCO contains two stages: multi-input semantic learning and contrast learning-driven semantic fusion error correction. In the first stage, preliminary error correction results from multiple single input models are integrated to provide sufficient complementary semantic information for semantic fusion. In the second stage, multiple complementary sentence semantics are optimized based on the contrastive learning approach to avoid over-correction of sentences by the model. The limitations of error correction results of the model are improved by fusing multiple complementary semantics for re-correction of erroneous sentences. Experimental results on the public datasets SIGHAN13, SIGHAN14 and SIGHAN15 demonstrate MIF-SECCO effectively improves the error correction performance of the model. © 2024 Science Press. All rights reserved.
引用
收藏
页码:85 / 94
页数:9
相关论文
共 23 条
  • [1] Devlin J., Chang M.W., Lee K., Et al., BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Proc of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies(Long and Short Papers), pp. 4171-4186, (2019)
  • [2] Brown T.B., Mann B., Ryder N., Et al., Language Models Are Few-Shot Learners, Proc of the 34th International Conference on Neural Information Processing Systems, pp. 1877-1901, (2020)
  • [3] Yang Z.L., Dai Z., Yang Y.M., Et al., XLNet: Generalized Au-toregressive Pretraining for Language Understanding, Proc of the 33rd International Conference on Neural Information Processing Systems, pp. 5753-5763, (2019)
  • [4] Chang T.H., Chen H.C., Yang C.H., Introduction to a Proofreading Tool for Chinese Spelling Check Task of SIGHAN-8, Proc of the 8th SIGHAN Workshop on Chinese Language Processing, pp. 50-55, (2015)
  • [5] Chu W.C., Lin C.J., NTOU Chinese Spelling Check System in SIGHAN-8 Bake-off, Proc of the 8th SIGHAN Workshop on Chinese Language Processing, pp. 137-143, (2015)
  • [6] Wang Y.R., Liao Y.F., Word Vector/ Conditional Random Field-Based Chinese Spelling Error Detection for SIGHAN-2015 Evaluation, Proc of the 8th SIGHAN Workshop on Chinese Language Processing, pp. 46-49, (2015)
  • [7] Zhang S.Y., Xiong J.H., Hou J.P., Et al., HANSpeller++: A Unified Framework for Chinese Spelling Correction, Proc of the 8th SIGHAN Workshop on Chinese Language Processing, pp. 38-45, (2015)
  • [8] Wang D.M., Song Y., Li J., Et al., A Hybrid Approach to Automatic Corpus Generation for Chinese Spelling Check, Proc of the Conference on Empirical Methods in Natural Language Processing, pp. 2517-2527, (2018)
  • [9] Zhang S.H., Huang H.R., Liu J.C., Et al., Spelling Error Correction with Soft-Masked BERT, Proc of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 882-890, (2020)
  • [10] Wang D.M., Tay Y., Zhong L., Confusionset-Guided Pointer Networks for Chinese Spelling Check, Proc of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 5780-5785, (2019)