Co-training an Unsupervised Constituency Parser withWeak Supervision

被引:0
作者
Maveli, Nickil [1 ]
Cohen, Shay B. [1 ]
机构
[1] Univ Edinburgh, Sch Informat, Inst Language Cognit & Computat, 10 Crichton St, Edinburgh EH8 9AB, Midlothian, Scotland
来源
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022) | 2022年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce a method for unsupervised parsing that relies on bootstrapping classifiers to identify if a node dominates a specific span in a sentence. There are two types of classifiers, an inside classifier that acts on a span, and an outside classifier that acts on everything outside of a given span. Through self-training and co-training with the two classifiers, we show that the interplay between them helps improve the accuracy of both, and as a result, effectively parse. A seed bootstrapping technique prepares the data to train these classifiers. Our analyses further validate that such an approach in conjunction with weak supervision using prior branching knowledge of a known language (left/right-branching) and minimal heuristics injects strong inductive bias into the parser, achieving 63.1 F-1 on the English (PTB) test set. In addition, we show the effectiveness of our architecture by evaluating on treebanks for Chinese (CTB) and Japanese (KTB) and achieve new state-of-the-art results.(1)
引用
收藏
页码:1274 / 1291
页数:18
相关论文
共 45 条
[1]  
Abney Steven, 2007, SEMISUPERVISED LEARN, P26
[2]  
[Anonymous], 2019, METALS BASEL, DOI DOI 10.3390/MET9101067
[3]  
[Anonymous], 2008, COLING 2008 22 INT
[4]  
Baker J., 1979, The Journal of the Acoustical Society of America, V65, P547
[5]  
Blum A., 1998, Proceedings of the Eleventh Annual Conference on Computational Learning Theory, P92, DOI 10.1145/279943.279962
[6]  
Bod R, 2006, COLING/ACL 2006, VOLS 1 AND 2, PROCEEDINGS OF THE CONFERENCE, P865
[7]  
Butler Alastair, 2012, P TEXT ANNO TATION W
[8]  
Cao S, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), P4798
[9]  
Chen D., 2014, EMNLP, P740, DOI DOI 10.3115/V1/D14-1082
[10]  
Choi Jihun, 2020, INT C LEARN REPR