A Turkish-German Code-Switching Corpus

被引：0

作者：

Cetinoglu, Ozlem ^{[1
]}

机构：

[1] Univ Stuttgart, IMS, Stuttgart, Germany

来源：

LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2016年

关键词：

code-switching; Turkish; German;

D O I：

暂无

中图分类号：

H [语言、文字];

学科分类号：

05 ;

摘要：

Bilingual communities often alternate between languages both in spoken and written communication. One such community, Germany residents of Turkish origin produce Turkish-German code-switching, by heavily mixing two languages at discourse, sentence, or word level. Code-switching in general, and Turkish-German code-switching in particular, has been studied for a long time from a linguistic perspective. Yet resources to study them from a more computational perspective are limited due to either small size or licence issues. In this work we contribute the solution of this problem with a corpus. We present a Turkish-German code-switching corpus which consists of 1029 tweets, with a majority of intra-sentential switches. We share different type of code-switching we have observed in our collection and describe our processing steps. The first step is data collection and filtering. This is followed by manual tokenisation and normalisation. And finally, we annotate data with word-level language identification information. The resulting corpus is available for research purposes.

引用

页码：4215 / 4220

页数：6

共 27 条

[1] Androutsopoulos J., 2001, CHAT KOMMUNIKATION S
[2] [Anonymous], 2015, PART OF SPEECH TAGGI
[3] [Anonymous], 2014, P 11 INT C NAT LANG
[4] [Anonymous], 2012, CAMBRIDGE HDB LINGUI
[5] Baltes, 2001, INT ENCY SOCIAL BEHA, P2062, DOI DOI 10.1016/B0-08-043076-7/03031-X
[6] A COEFFICIENT OF AGREEMENT FOR NOMINAL SCALES
COHEN, J
[J]. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1960, 20 (01) : 37 - 46
[7] Elfardy Heba, 2013, Natural Language Processing and Information Systems. 18th International Conference on Applications of Natural Language to Information Systems, NLDB 2013. Proceedings: LNCS 7934, P412, DOI 10.1007/978-3-642-38824-8_51
[8] Gella Spandana, 2014, P 2014 C EMPIRICAL M, P974
[9] Receptive multilingualism in an immigrant constellation: Examples from Turkish-German children's language
Herkenrath, Annette
[J]. INTERNATIONAL JOURNAL OF BILINGUALISM, 2012, 16 (03) : 287 - 314
[10] Deutsch, Doyc or Doitsch? Chatters as Languagers - The Case of a German - Turkish Chat
Hinnenkamp, Volker
[J]. INTERNATIONAL JOURNAL OF MULTILINGUALISM, 2008, 5 (03) : 253 - 275

← 1 2 3 →