An Empirical Study of Encoders and Decoders in Graph-Based Dependency Parsing

被引：1

作者：

Wang, Ge ^{[1
,2
,3
]}

Hu, Ziyuan ^{[1
]}

Hu, Zechuan ^{[1
]}

Tu, Kewei ^{[1
]}

机构：

[1] ShanghaiTech Univ, Sch Informat Sci & Technol, Shanghai 201210, Peoples R China

[2] Chinese Acad Sci, Shanghai Inst Microsyst & Informat Technol, Shanghai 200050, Peoples R China

[3] Univ Chinese Acad Sci, Beijing 100049, Peoples R China

来源：

IEEE ACCESS | 2020年 / 8卷

基金：

中国国家自然科学基金;

关键词：

Dependency parsing; high-order model; neural network;

D O I：

10.1109/ACCESS.2020.2974109

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Graph-based dependency parsing consists of two steps: first, an encoder produces a feature representation for each parsing substructure of the input sentence, which is then used to compute a score for the substructure; and second, a decoder finds the parse tree whose substructures have the largest total score. Over the past few years, powerful neural techniques have been introduced into the encoding step which substantially increases parsing accuracies. However, advanced decoding techniques, in particular high-order decoding, have seen a decline in usage. It is widely believed that contextualized features produced by neural encoders can help capture high-order decoding information and hence diminish the need for a high-order decoder. In this paper, we empirically evaluate the combinations of different neural and non-neural encoders with first- and second-order decoders and provide a comprehensive analysis about the effectiveness of these combinations with varied training data sizes. We find that: first, when there is large training data, a strong neural encoder with first-order decoding is sufficient to achieve high parsing accuracy and only slightly lags behind the combination of neural encoding and second-order decoding; second, with small training data, a non-neural encoder with a second-order decoder outperforms the other combinations in most cases.

引用

页码：35770 / 35776

页数：7

共 20 条

[1]

Andor D, 2016, PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, P2442

[2]

[Anonymous], 1996, P 16 C COMP LING COL

[3]

[Anonymous], P 11 C EUR CHAPT ASS

[4]

[Anonymous], NEURAL COMPUT

[5]

[Anonymous], P 10 C COMP NAT LANG

[6]

Boski M, 2017, 2017 10TH INTERNATIONAL WORKSHOP ON MULTIDIMENSIONAL (ND) SYSTEMS (NDS)

[7]

CHU YJ, 1965, SCI SINICA, V14, P1396

[8]

De Marneffe M.-C., 2008, STANFORD TYPED DEPEN

[9]

Dozat T, 2016, P 5 INT C LEARN REPR, DOI DOI 10.48550/ARXIV.1611.01734

[10] OPTIMUM BRANCHINGS [J].

EDMONDS, J .

JOURNAL OF RESEARCH OF THE NATIONAL BUREAU OF STANDARDS SECTION B-MATHEMATICAL SCIENCES, 1967, B 71 (04) :233-+

← 1 2 →