Cross-Lingual Universal Dependency Parsing Only From One Monolingual Treebank

被引:4
作者
Sun, Kailai [1 ]
Li, Zuchao [2 ]
Zhao, Hai [1 ]
机构
[1] Shanghai Jiao Tong Univ, AI Inst, Dept Comp Sci & Engn,MoE Key Lab Artificial Intell, Key Lab Shanghai Educ Commiss Intelligent Interact, Shanghai 200240, Peoples R China
[2] Wuhan Univ, Sch Comp Sci, Wuhan 430072, Peoples R China
基金
中国国家自然科学基金;
关键词
Training; Task analysis; Data models; Syntactics; Annotations; Transfer learning; Silver; Universal dependency parsing; few-shot parsing; zero-shot parsing; cross-lingual language processing; self-training;
D O I
10.1109/TPAMI.2023.3291388
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Syntactic parsing is a highly linguistic processing task whose parser requires training on treebanks from the expensive human annotation. As it is unlikely to obtain a treebank for every human language, in this work, we propose an effective cross-lingual UD parsing framework for transferring parser from only one source monolingual treebank to any other target languages without treebank available. To reach satisfactory parsing accuracy among quite different languages, we introduce two language modeling tasks into the training process of dependency parsing as multi-tasking. Assuming only unlabeled data from target languages plus the source treebank can be exploited together, we adopt a self-training strategy for further performance improvement in terms of our multi-task framework. Our proposed cross-lingual parsers are implemented for English, Chinese, and 29 UD treebanks. The empirical study shows that our cross-lingual parsers yield promising results for all target languages, approaching the parser performance which is trained in its own target treebank.
引用
收藏
页码:13393 / 13407
页数:15
相关论文
共 83 条
[71]  
Xiao M., 2014, P 18 C COMP NAT LANG, P119
[72]  
Xue N., 2002, P 19 INT C COMP LING, P1
[73]  
Yang S., 2020, P 28 INT C COMP LING, P3911
[74]  
YANG Z, 2019, ADV NEURAL INFORM PR, P5754
[75]  
Zeman Daniel., 2008, P IJCNLP 08 WORKSHOP, P35
[76]  
Zhang Bowen, 2021, Advances in Neural Information Processing Systems, V34
[77]  
Zhang MS, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P997
[78]  
Zhang Y, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P3295
[79]  
Zhou JR, 2020, FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, P4438
[80]  
Zhou JR, 2020, FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, P4450