Norm It! Lexical Normalization for Italian and Its Downstream Effects for Dependency Parsing

被引:0
|
作者
van der Goot, Rob [1 ]
Ramponi, Alan [1 ,2 ]
Caselli, Tommaso [4 ]
Cafagna, Michele [3 ,4 ]
De Mattel, Lorenzo [3 ]
机构
[1] IT Univ Copenhagen, Copenhagen, Denmark
[2] Univ Trento, Trento, Italy
[3] Univ Pisa, Pisa, Italy
[4] Univ Groningen, Groningen, Netherlands
关键词
Corpus; (Creation; Annotation etc.); Parsing; Grammar; Syntax; Treebank; Social Media Processing; Italian; Normalization;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Lexical normalization is the task of translating non-standard social media data to a standard form. Previous work has shown that this is beneficial for many downstream tasks in multiple languages. However, for Italian, there is no benchmark available for lexical normalization, despite the presence of many benchmarks for other tasks involving social media data. In this paper, we discuss the creation of a lexical normalization dataset for Italian. After two rounds of annotation, a Cohen's kappa score of 78.64 is obtained. During this process, we also analyze the inter-annotator agreement for this task, which is only rarely done on datasets for lexical normalization, and when it is reported, the analysis usually remains shallow. Furthermore, we utilize this dataset to train a lexical normalization model and show that it can be used to improve dependency parsing of social media data. All annotated data and the code to reproduce the results are available at: http://bitbucket.org/robvanderg/normit.
引用
收藏
页码:6272 / 6278
页数:7
相关论文
共 50 条
  • [1] Improving Chinese Dependency Parsing with Lexical Semantic Features
    Zheng, Lvexing
    Wang, Houfeng
    Lv, Xueqiang
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2015, 2015, 9362 : 36 - 46
  • [2] Combining Dependency Parsing and a Lexical Network Based on Lexical Functions for the Identification of Collocations
    Fonseca, Alexsandro
    Sadat, Fatiha
    Lareau, Francois
    COMPUTATIONAL AND CORPUS-BASED PHRASEOLOGY, EUROPHRAS 2017, 2017, 10596 : 447 - 461
  • [3] Dependency Parsing of Modern Standard Arabic with Lexical and Inflectional Features
    Marton, Yuval
    Habash, Nizar
    Rambow, Owen
    COMPUTATIONAL LINGUISTICS, 2013, 39 (01) : 161 - 194
  • [4] Revisiting the Effects of Leakage on Dependency Parsing
    Krasner, Nathaniel
    Wanner, Miriam
    Anastasopoulos, Antonios
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 2925 - 2934
  • [5] From Italian Text to TimeML Document via Dependency Parsing
    Robaldo, Livio
    Caselli, Tommaso
    Russo, Irene
    Grella, Matteo
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, PT II, 2011, 6609 : 177 - +
  • [6] Verb Guidance and Other Lexical Effects in Parsing
    Mitchell, Don C.
    LANGUAGE AND COGNITIVE PROCESSES, 1989, 4 (3-4): : SI123 - SI154
  • [7] Robust dependency parsing of spontaneous Japanese speech and its evaluation
    Ohno, Tomohiro
    Matsubara, Shigeki
    Kawaguchi, Nobuo
    Inagaki, Yasuyoshi
    8th International Conference on Spoken Language Processing, ICSLP 2004, 2004, : 2173 - 2176
  • [8] Effects of prosodic and lexical constraints on parsing in young children (and adults)
    Snedeker, Jesse
    Yuan, Sylvia
    JOURNAL OF MEMORY AND LANGUAGE, 2008, 58 (02) : 574 - 608
  • [9] Syllabic Effects in Italian Lexical Access
    Lara Tagliapietra
    R. Fanari
    S. Collina
    P. Tabossi
    Journal of Psycholinguistic Research, 2009, 38 : 511 - 526
  • [10] Syllabic Effects in Italian Lexical Access
    Tagliapietra, Lara
    Fanari, R.
    Collina, S.
    Tabossi, P.
    JOURNAL OF PSYCHOLINGUISTIC RESEARCH, 2009, 38 (06) : 511 - 526