Splitting Complex Sentences for Natural Language Processing Applications: Building a Simplified Spanish Corpus

被引:8
作者
Camacho Collados, Jose [1 ]
机构
[1] Univ Autonoma Barcelona, Barcelona 08290, Spain
来源
CORPUS RESOURCES FOR DESCRIPTIVE AND APPLIED STUDIES. CURRENT CHALLENGES AND FUTURE DIRECTIONS: SELECTED PAPERS FROM THE 5TH INTERNATIONAL CONFERENCE ON CORPUS LINGUISTICS (CILC2013) | 2013年 / 95卷
关键词
text simplification; syntactic simplification; parallel corpus; spanish; natural language processing;
D O I
10.1016/j.sbspro.2013.10.670
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
This paper presents a new Spanish parallel corpus of original and syntactically simplified texts. The simplification carried out basically consists of opportunistically splitting a complex original sentence into several simple ones. This parallel corpus is envisioned as a first step in order to create an automatic syntactic simplification system to be used as a preprocessing tool for other Natural Language Processing tasks such as Text Summarization, Information Extraction, parsing or Machine Translation. The corpus has been evaluated by human annotators regarding its grammaticality and preservation of meaning. The results suggest that the meaning of simplified and original sentences is almost identical. (C) 2013 The Authors. Published by Elsevier Ltd.
引用
收藏
页码:464 / 472
页数:9
相关论文
共 18 条
[1]  
Aluísio SM, 2008, DOCENG'08: PROCEEDINGS OF THE EIGHTH ACM SYMPOSIUM ON DOCUMENT ENGINEERING, P240
[2]  
[Anonymous], MACH TRANSL SUMM 7
[3]  
[Anonymous], 2010, Statistical Machine Translation
[4]  
Bosque I., 2004, REDES DECCIONARIO CO
[5]  
Bott S., 2011, REV SOCIEDAD ESPANOL
[6]  
Bott S., 2011, P WORKSH MON TEXT TO, P20
[7]  
Cardey S., 2004, P INT C INF TECHN CO
[8]  
Cardey S., 2011, P 12 INT S SOC COMM, P953
[9]  
Caseli H.M., 2009, ADV COMPUTATIONAL LI, V41, P59
[10]  
Chandrasekar R., 1996, COLING, V2, P1041, DOI 10.3115/993268.993361