Arabic Transliteration of Romanized Tunisian Dialect Text: A Preliminary Investigation

被引:13
作者
Masmoudi, Abir [1 ,2 ]
Habash, Nizar [3 ]
Ellouze, Mariem [1 ]
Esteve, Yannick [2 ]
Belguith, Lamia Hadrich [1 ]
机构
[1] Univ Sfax, MIRACL Lab, ANLP Res Grp, Sfax, Tunisia
[2] Univ Maine, LIUM, Paris, France
[3] New York Univ Abu Dhabi, Abu Dhabi, U Arab Emirates
来源
COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT I | 2015年 / 9041卷
关键词
Tunisian Dialect; corpus; transliteration; normalization; CODA;
D O I
10.1007/978-3-319-18111-0_46
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we describe the process of converting Tunisian Dialect text that is written in Latin script (also called Arabizi) into Arabic script following the CODA orthography convention for Dialectal Arabic. Our input consists of messages and comments taken from SMS, social networks and broadcast videos. The language used in social media and SMS messaging is characterized by the use of informal and non-standard vocabulary such as repeated letters for emphasis, typos, non-standard abbreviations, and nonlinguistic content, such as emoticons. There is a high degree of variation is spelling in Arabic dialects due to the lack of orthographic widely supported standards in both Arabic and Latin scripts. In the context of natural language processing, transliterating from Arabizi to Arabic script is a necessary step since most recently available tools for processing Arabic Dialects expect Arabic script input.
引用
收藏
页码:608 / 619
页数:12
相关论文
共 16 条
[1]  
Al-Gaphari G., 2010, INT J INFORM SCI MAN
[2]  
[Anonymous], 2014, P C COMP NAT LANG LE
[3]  
[Anonymous], CORR
[4]  
Bies A., 2014, ARABIC NATURAL LANGU
[5]  
Chalabi A., 2012, P 2 WORKSH ADV TEXT
[6]  
Cheng X., 2007, UNDERSTANDING CHARAC
[7]  
Diab M., 2012, P LANG RES EV C IST
[8]  
Eskander R., 2014, AR NAT LANG PROC WOR
[9]  
Jarrar M., 2014, P AR NAT LANG PROC W
[10]   Codeswitching in Tunisia: Attitudinal and behavioural dimensions [J].
Lawson, S ;
Sachdev, I .
JOURNAL OF PRAGMATICS, 2000, 32 (09) :1343-1361