Research on Semantic Disambiguation in Treebank

被引:0
作者
Miao, Lin [1 ]
Lv, Xueqiang [1 ]
Wu, Yunfang [2 ]
Wang, Yue [1 ]
机构
[1] Beijing Informat Sci & Technol Univ, Beijing Key Lab Internet Culture & Digital Dissem, Beijing 100101, Peoples R China
[2] Peking Univ, Sch Elect Engn & Comp Sci, Beijing 100871, Peoples R China
来源
WEB TECHNOLOGIES AND APPLICATIONS (APWEB 2015) | 2015年 / 9313卷
关键词
Treebank; Data sparseness; Semantic disambiguation; Cilin;
D O I
10.1007/978-3-319-25255-1_54
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The increasingly widespread application of natural language processing technology leads parsing to play a significant role. As a result, the size and quality of treebank have become the focus of relevant research. However, there exists data sparseness when we use the treebank to parse. With the help of Cilin semantic information and words contextual information, this paper proposes a context-based lexical semantics disambiguation method. After applying this method on CTB (Chinese Treebank) 5.0 and TCT (Tsinghua Chinese Treebank), using Berkeley Parser achieved relatively good results. In Penn Chinese Treebank, the precision and recall rates reached 85.35% and 84.34% respectively, and the F value reached 84.84%. Comparing with the parsing results of using the original corpus, the correct rate increased by 1.86% and the recall rate increased by 1.02% and the comprehensive index F value increased by 1.35%. As consequence, the overall parsing error rate dropped by 8.17%.
引用
收藏
页码:658 / 669
页数:12
相关论文
共 12 条
[1]  
Agirre Eneko, 2008, ACL
[2]  
[Anonymous], P 50 ANN M ASS COMP
[3]  
[Anonymous], P CONLL SHAR TASK SE
[4]  
Charniak E., 1997, AAAI/IAAI, P598
[5]  
Hatori J., 2012, 50 ANN M ASS COMPUT, V1, P1045
[6]  
Hirst G., 2012, P 50 ANN M ASS COMP, V1
[7]  
Jones Bevan., 2012, Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), P488
[8]  
Manning C., 1999, Foundations of statistical natural language processing
[9]  
McDonald Ryan, 2006, P 10 C COMP NAT LANG, P216
[10]  
Petrov S, 2006, COLING/ACL 2006, VOLS 1 AND 2, PROCEEDINGS OF THE CONFERENCE, P433