Portuguese Text Generation from Large Corpora

被引:0
作者
de Novais, Eder M. [1 ]
Paraboni, Ivandre [1 ]
da Silva Junior, Douglas F. P. [1 ]
机构
[1] Univ Sao Paulo, Sch Arts Sci & Humanities, Sao Paulo, Brazil
来源
LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2012年
关键词
Text Generation; Surface Realisation; Language Modelling;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
In the implementation of a surface realisation engine, many of the computational techniques seen in other AT fields have been widely applied. Among these, the use of statistical methods has been particularly successful, as in the so-called 'generate-and-select', or 2-stages architectures. Systems of this kind produce output strings from possibly underspecified input data by over-generating a large number of alternative realisations (often including ungrammatical candidate sentences.) These are subsequently ranked with the aid of a statistical language model, and the most likely candidate is selected as the output string. Statistical approaches may however face a number of difficulties. Among these, there is the issue of data sparseness, a problem that is particularly evident in cases such as our target language - Brazilian Portuguese - which is not only morphologically-rich, but relatively poor in NLP resources such as large, publicly available corpora. In this work we describe a first implementation of a shallow surface realisation system for this language that deals with the issue of data sparseness by making use of factored language models built from a (relatively) large corpus of Brazilian newspapers articles.
引用
收藏
页码:4010 / 4014
页数:5
相关论文
共 13 条
[1]  
[Anonymous], P 1 N AM ASS COMP LI
[2]  
[Anonymous], 2009, P 12 EUR WORKSH NAT
[3]  
Bilmes J., 2003, P HLT NAACL 2003 C, V2
[4]  
de Novais EM, 2010, LECT NOTES ARTIF INT, V6433, P316
[5]  
de Novais EM, 2011, LECT NOTES COMPUT SC, V6608, P429, DOI 10.1007/978-3-642-19400-9_34
[6]  
Gatt A., 2009, P 12 EUR WORKSH NAT
[7]  
Langkilde I, 2000, 6TH APPLIED NATURAL LANGUAGE PROCESSING CONFERENCE/1ST MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE AND PROCEEDINGS OF THE ANLP-NAACL 2000 STUDENT RESEARCH WORKSHOP, pA170
[8]  
Malouf R., 2000, P ACL 2000 C HONG KO
[9]  
Muniz M. C., 2005, P 3 INF LANG TECHN W
[10]  
Nunes M. G. V., 1996, 2 ENC PROC COMP PORT, P61