Adjusting Indonesian Multiword Expression Annotation to the Penn Treebank Format

被引:0
作者
Arwidarasti, Jessica Naraiswari [1 ]
Alfina, Ika [1 ]
Krisnadhi, Adila Alfa [1 ]
机构
[1] Univ Indonesia, Fac Comp Sci, Depok, Indonesia
来源
2020 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2020) | 2020年
关键词
compound word; Indonesian; multiword expression; Penn Treebank; Stanford parser;
D O I
10.1109/ialp51396.2020.9310479
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multiword Expression (MWE) has been a pain in the neck, especially in determining its word-classes in syntactic treebank. Previous work had proposed annotation guidelines for Indonesian MWEs that align to the Penn Treebank (PTB) format. However, we think that their proposed annotation still needs improvements. Therefore, this study proposes a new annotation guideline in labeling Indonesian MWE that conforms to PTB format. Moreover, we also revised the MWE annotation of an existing Indonesian constituency treebank consisting of 1030 sentences to conform to the new guidelines. To evaluate the revised treebank's quality, we built an Indonesian constituency parser model using the revised treebank and Stanford parser. The experiments show that the resulting parser has an F1-score of 69.97%.
引用
收藏
页码:75 / 80
页数:6
相关论文
共 14 条
  • [1] Abeille A., 2000, 2 INT C LANG RES EV
  • [2] Alwi Hasan., 1998, Tata Bahasa Baku Bahasa Indonesia
  • [3] Arwidarasti J. N., 2019, P 2019 INT C AS LANG
  • [4] Dinakaramani A, 2014, INT CONF ASIAN LANG, P66, DOI 10.1109/IALP.2014.6973519
  • [5] Filino M, 2016, PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON DATA AND SOFTWARE ENGINEERING (ICODSE)
  • [6] Grossmann R, 2018, SYSTEMIC ORGANIZATION DEVELOPMENT, P45
  • [7] Kurniawan K. M., 2017, THESIS
  • [8] Marcus M.P., 1993, Comput. Linguist, V19, P313
  • [9] Purwarianti A., 2017, TECH REP
  • [10] Quirk R., 1985, COMPREHENSIVE GRAMMA, V8