The Tembusu Treebank: An English Learner Treebank

被引:0
作者
da Costa, Luis Morgado [1 ]
Bond, Francis [1 ]
Winder, Roger V. P. [2 ]
机构
[1] Palacky Univ Olomouc, Asian Studies, Olomouc, Czech Republic
[2] Nanyang Technol Univ, LCC, Singapore, Singapore
来源
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2022年
关键词
treebank; learner corpus; error detection; error diagnosis; parsing;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper reports on the creation and development of the Tembusu Learner Treebank - an open treebank created from the NTU Corpus of Learner English, unique for incorporating mal-rules in the annotation of ungrammatical sentences. It describes the motivation and development of the treebank, as well as its exploitation to build a new parse-ranking model for the English Resource Grammar, designed to help improve the parse selection of ungrammatical sentences and diagnose these sentences through mal-rules. The corpus contains 25,000 sentences, of which 4,900 are treebanked. The paper concludes with an evaluation experiment that shows the usefulness of this new treebank in the tasks of grammatical error detection and diagnosis.
引用
收藏
页码:4817 / 4826
页数:10
相关论文
共 34 条
  • [1] [Anonymous], 2012, P 11 INT WORKSH TREE
  • [2] [Anonymous], 2003, P CORP LING 2003 C
  • [3] Bender E. M., 2004, INSTIL ICALL S 2004
  • [4] Berzak Y, 2016, PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, P737
  • [5] BLACK E, 1991, SPEECH NATURAL LANGU, P306, DOI DOI 10.3115/112405.112467
  • [6] Bond F., 2004, Natural Language Processing - IJCNLP 2004. First International Joint Conference. Revised Selected Papers (Lecture Notes in Artificial Intelligence Vol. 3248), P158
  • [7] Bond Francis, 2008, LANG RESOUR EVAL, V42, P243, DOI DOI 10.1007/S10579-008-9062-Z
  • [8] Collins M, 1997, 35TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 8TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P16
  • [9] Copestake A., 2000, Proceedings of LREC 2000, P591
  • [10] Copestake Ann, 2005, Research on Language and Computation