Neural Text Categorization with Transformers for Learning Portuguese as a Second Language

被引:2
作者
Santos, Rodrigo [1 ]
Rodrigues, Joao [1 ]
Branco, Antonio [1 ]
Vaz, Rui [2 ]
机构
[1] Univ Lisbon, Fac Ciencias, Dept Informat, NLX Nat Language & Speech Grp, P-1749016 Lisbon, Portugal
[2] Camoes IP Inst Cooperacao & Lingua, Av Liberdade 270, P-1250149 Lisbon, Portugal
来源
PROGRESS IN ARTIFICIAL INTELLIGENCE (EPIA 2021) | 2021年 / 12981卷
关键词
Readability classification; Language proficiency; Neural networks; Deep learning; Portuguese;
D O I
10.1007/978-3-030-86230-5_56
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We report on the application of a neural network based approach to the problem of automatically categorizing texts according to their proficiency levels and suitability for learners of Portuguese as a second language. We resort to a particular deep learning architecture, namely Transformers, as we fine-tune GPT-2 and RoBERTa on data sets labeled with respect to the standard CEFR proficiency levels, that were provided by Camoes IC, the Portuguese official language institute. Despite the reduced size of the data sets available, we found that the resulting models overperform previous carefully crafted feature based counterparts in most evaluation scenarios, thus offering a new state-ofthe-art for this task in what concerns the Portuguese language.
引用
收藏
页码:715 / 726
页数:12
相关论文
共 45 条
  • [1] Aluisio Sandra, 2010, Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications, P1
  • [2] Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
  • [3] BARRETO FLORBELA., 2006, Proceedings of the 5th Language Resources and Evaluation Conference LREC, P1438
  • [4] Branco Antonio, 2012, Computational Processing of the Portuguese Language. Proceedings of the 10th International Conference, PROPOR 2012, P1, DOI 10.1007/978-3-642-28885-2_1
  • [5] Branco A., 2003, P 18 ANN M PORT ASS, P201
  • [6] BRANCO A., 2006, Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL'06), P179
  • [7] Branco A., LNCS LNAI, V8775, P256
  • [8] Branco A, 2010, LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P1810
  • [9] Branco A, 2014, 2014 INTERNATIONAL CONFERENCE ON INFORMATION SOCIETY (I-SOCIETY 2014), P70, DOI 10.1109/i-Society.2014.7009014
  • [10] BRANCO ANTONIO., 2011, CINTIL DepBank Handbook: Design Options for the Representation of Grammatical Dependencies