Transform, Combine, and Transfer: Delexicalized Transfer Parser for Low-resource Languages

被引:1
作者
Das, Ayan [1 ]
Sarkar, Sudeshna [1 ]
机构
[1] Indian Inst Technol Kharagpur, Dept Comp Sci & Engn, Kharagpur 721302, W Bengal, India
关键词
Transfer parsing; delexicalization; cross-lingual transfer parsing; syntax; dependency parsing;
D O I
10.1145/3325886
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transfer parsing has been used for developing dependency parsers for languages with no treebank by using transfer from treebanks of other languages (source languages). In delexicalized transfer, parsed words are replaced by their part-of-speech tags. Transfer parsing may not work well if a language does not follow uniform syntactic structure with respect to its different constituent patterns. Earlier work has used information derived from linguistic databases to transform a source language treebank to reduce the syntactic differences between the source and the target languages. We propose a transformation method where a source language pattern is transformed stochastically to one of the multiple possible patterns followed in the target language. The transformed source language treebank can be used to train a delexicalized parser in the target language. We show that this method significantly improves the average performance of single-source delexicalized transfer parsers. We also show that, in the multi-source settings, parsers trained using a concatenation of transformed source language treebanks work better when a subset of the source language treebanks is used rather than concatenating all of them or only one. However, the problem of selecting the subset of treebanks whose combination gives the best-performing parser from the set of all the available treebanks is hard. We propose a greedy selection heuristic based on the labelled attachment scores of the corresponding single-source parsers trained using the treebanks after transformation.
引用
收藏
页数:30
相关论文
共 34 条
  • [11] Haspelmath Martin., 2005, World atlas of language structures
  • [12] Hwa Rebecca, 2005, Natural Language Engineering, V11, P11
  • [13] Lacroix O., 2016, P 2016 C N AM CHAPT, P1058
  • [14] Ma XZ, 2014, PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, P1337
  • [15] McDonald R., 2011, Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP '11, P62
  • [16] McDonald R., 2012, P 2012 C N AM CHAPTE, P477
  • [17] Naseem T., 2012, P 50 ANN M ASS COMP, V1, P629
  • [18] Nivre J, 2016, LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P1659
  • [19] Nivre Joakim, 2016, PAPER PRESENTED WORK, P38
  • [20] Petrov Slav, 2012, P 8 INT C LANG RES E, P23