code-switching;
matrix language;
language modeling;
POS tagging;
D O I:
10.21437/Interspeech.2018-1284
中图分类号:
TP18 [人工智能理论];
学科分类号:
081104 ;
0812 ;
0835 ;
1405 ;
摘要:
Since the work of Joshi [1], most models of code-switching (C-S) have assumed asymmetry of the participating languages. While there exist patterns of language mixing in which a dominant or matrix language (ML) may not be discernible, these more complex signatures are rarely modeled [2, 3]. We use a series of metrics to characterize the switching in corpora as asymmetrical (insertional C-S) or symmetrical (alternational C-S). We test the efficacy of a linguistic model that assumes no ML in predicting the syntax of C-S in three Spanish English corpora that vary according to whether the ML is Spanish, English or indeterminate. Our results show that the same constraints on the grammatical junctures and on the directionality of switching hold irrespective of the symmetry of the data. The length of the alternating language spans varies according to POS with noun phrases comprising the shortest spans. This suggests that insertional C-S may be subsumed under alternational C-S, as spontaneous borrowing. These results invite researchers to reconsider the linguistic theories they adopt and to expand the typology of training data used in creating language models and processing tools for C-S.
机构:
Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA
Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD USAJohns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA
White, Christopher M.
Khudanpur, Sanjeev
论文数: 0引用数: 0
h-index: 0
机构:
Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD USAJohns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA
Khudanpur, Sanjeev
Baker, James K.
论文数: 0引用数: 0
h-index: 0
机构:
Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USAJohns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA
Baker, James K.
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5,
2008,
: 2691
-
+
机构:
Florida State Univ, Dept Modern Languages & Linguist, Tallahassee, FL 32306 USAFlorida State Univ, Dept Modern Languages & Linguist, Tallahassee, FL 32306 USA
Muntendam, Antje
Couto, M. Carmen Parafita
论文数: 0引用数: 0
h-index: 0
机构:
Leiden Univ, Ctr Linguist, NL-2300 RA Leiden, Netherlands
Univ Vigo, Language Variat & Textual Categorisat LVTC, Vigo 36310, SpainFlorida State Univ, Dept Modern Languages & Linguist, Tallahassee, FL 32306 USA