Analyzing Code-mixing in Linguistic Corpora Using Kratylos

被引:0
|
作者
Finkel, Raphael [1 ]
Kaufman, Daniel [2 ,3 ,5 ]
Shamim, Ahmed [4 ]
机构
[1] Univ Kentucky, Dept Comp Sci, Lexington, KY 40506 USA
[2] CUNY Queens Coll, Dept Linguist, Flushing, NY 11367 USA
[3] ELA, 3 West 18th St,6th Fl, New York, NY 10011 USA
[4] Univ Texas Austin, Dept Asian Studies, 120 Inner Campus Dr, Austin, TX 78712 USA
[5] Queens Coll, Dept Linguist, Flushing, NY 11367 USA
来源
ACM JOURNAL ON COMPUTING AND CULTURAL HERITAGE | 2022年 / 15卷 / 01期
基金
美国国家科学基金会;
关键词
Language archives; linguistics; interlinear glossed texts; lexicons; LANGUAGE DOCUMENTATION; ENDANGERED LANGUAGES;
D O I
10.1145/3480238
中图分类号
C [社会科学总论];
学科分类号
03 ; 0303 ;
摘要
Code-switching, code-mixing, and, more generally, multilingualism pose technological challenges for language documentation, the sub-discipline of linguistics that deals with the annotation and basic analysis of field recordings and other primary data. We focus here on a case study involving code-mixing in the endangered Koda language, which poses special problems for morphosyntactic analysis. We offer a robust approach to multilingual annotations that involves a combination of the popular open source software Field Works Language Explorer (FLEx) with Kratylos, a web-based corpus tool for display and query. Kratylos exposes linguistic data from various formats to powerful regular-expression queries that can exploit tier structure and other aspects of interlinear glossed text. We show how Kratylos can target mixed structures in our FLEx database of Koda that cannot be easily identified within the original FLEx software itself.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] GOVERNMENT AND CODE-MIXING
    DISCIULLO, AM
    MUYSKEN, P
    SINGH, R
    JOURNAL OF LINGUISTICS, 1986, 22 (01) : 1 - 24
  • [2] CODE-SWITCHING OR CODE-MIXING
    THELANDER, M
    LINGUISTICS, 1976, (183) : 103 - 123
  • [3] FUNCTIONS OF CODE-MIXING IN KANNADA
    SRIDHAR, SN
    INTERNATIONAL JOURNAL OF THE SOCIOLOGY OF LANGUAGE, 1978, (16) : 109 - 117
  • [4] ARE THERE SYNTACTIC CONSTRAINTS ON CODE-MIXING
    BOKAMBA, EG
    VARIATION IN LANGUAGE : NWAV-XV AT STANFORD, 1987, : 35 - 51
  • [5] Code-Mixing: A Brief Survey
    Thara, S.
    Poornachandran, Prabaharan
    2018 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2018, : 2382 - 2388
  • [6] CODE-MIXING, LANGUAGE VARIATION, AND LINGUISTIC THEORY - EVIDENCE FROM BANTU LANGUAGES
    BOKAMBA, EG
    LINGUA, 1988, 76 (01) : 21 - 62
  • [7] Language Modeling for Code-Mixing: The Role of Linguistic Theory based Synthetic Data
    Pratapa, Adithya
    Bhat, Gayatri
    Choudhury, Monojit
    Sitaram, Sunayana
    Dandapat, Sandipan
    Bali, Kalika
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 1543 - 1553
  • [8] The code-mixing of the Senegalese migrants in Italy
    Tramutoli, Laura
    INTERNATIONAL JOURNAL OF BILINGUALISM, 2021, 25 (05) : 1235 - 1262
  • [9] Differentiating Code-Borrowing from Code-Mixing
    Prabhugaonkar, Neha
    Peketi, Sai Kiran
    Ganeshan, Kavita
    Sureshkumar, Unnikrishnan
    PROCEEDINGS OF THE FOURTH ACM IKDD CONFERENCES ON DATA SCIENCES (CODS '17), 2017,
  • [10] Code-Mixing as a Bilingual Instructional Strategy
    Jiang, Yih-Lin Belinda
    Garcia, Georgia Earnest
    Willis, Arlette Ingram
    BILINGUAL RESEARCH JOURNAL, 2014, 37 (03) : 311 - 326