Transform, Combine, and Transfer: Delexicalized Transfer Parser for Low-resource Languages

被引:1
作者
Das, Ayan [1 ]
Sarkar, Sudeshna [1 ]
机构
[1] Indian Inst Technol Kharagpur, Dept Comp Sci & Engn, Kharagpur 721302, W Bengal, India
关键词
Transfer parsing; delexicalization; cross-lingual transfer parsing; syntax; dependency parsing;
D O I
10.1145/3325886
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transfer parsing has been used for developing dependency parsers for languages with no treebank by using transfer from treebanks of other languages (source languages). In delexicalized transfer, parsed words are replaced by their part-of-speech tags. Transfer parsing may not work well if a language does not follow uniform syntactic structure with respect to its different constituent patterns. Earlier work has used information derived from linguistic databases to transform a source language treebank to reduce the syntactic differences between the source and the target languages. We propose a transformation method where a source language pattern is transformed stochastically to one of the multiple possible patterns followed in the target language. The transformed source language treebank can be used to train a delexicalized parser in the target language. We show that this method significantly improves the average performance of single-source delexicalized transfer parsers. We also show that, in the multi-source settings, parsers trained using a concatenation of transformed source language treebanks work better when a subset of the source language treebanks is used rather than concatenating all of them or only one. However, the problem of selecting the subset of treebanks whose combination gives the best-performing parser from the set of all the available treebanks is hard. We propose a greedy selection heuristic based on the labelled attachment scores of the corresponding single-source parsers trained using the treebanks after transformation.
引用
收藏
页数:30
相关论文
共 34 条
  • [1] Ammar Waleed, 2016, MANY LANGUAGES ONE P
  • [2] Andor D., 2016, Globally normalized transition-based neural networks
  • [3] Aufrant L., 2016, P COLING 2016 26 INT, P119
  • [4] Bjorkelund Anders., 2017, P CONLL 2017 SHAR TA, P40
  • [5] Chen D., 2014, P 2014 C EMP METH NA, P740, DOI [10.3115/V1/D14-1082, DOI 10.3115/V1/D14-1082]
  • [6] Dozat Timothy, 2017, Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, P20
  • [7] Duong Long, 2015, P 19 C COMPUTATIONAL, P113
  • [8] Guo J, 2016, AAAI CONF ARTIF INTE, P2734
  • [9] Guo J, 2015, PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, P1234
  • [10] Hajiˇc Jan, 2016, P 10 INT C LANG RES