ZAEBUC: An Annotated Arabic-English Bilingual Writer Corpus

被引:0
|
作者
Habash, Nizar [1 ]
Palfreyman, David [2 ]
机构
[1] New York Univ Abu Dhabi, Abu Dhabi, U Arab Emirates
[2] Zayed Univ, Abu Dhabi, U Arab Emirates
来源
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2022年
关键词
Annotated Corpus; Learner Corpus; CEFR; Arabic; English;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We present ZAEBUC, an annotated Arabic-English bilingual writer corpus comprising short essays by first-year university students at Zayed University in the United Arab Emirates. We describe and discuss the various guidelines and pipeline processes we followed to create the annotations and quality check them. The annotations include spelling and grammar correction, morphological tokenization, Part-of-Speech tagging, lemmatization, and Common European Framework of Reference (CEFR) ratings. All of the annotations are done on Arabic and English texts using consistent guidelines as much as possible, with tracked alignments among the different annotations, and to the original raw texts. For morphological tokenization, POS tagging, and lemmatization, we use existing automatic annotation tools followed by manual correction. We also present various measurements and correlations with preliminary insights drawn from the data and annotations. The publicly available ZAEBUC corpus and its annotations are intended to be the stepping stones for additional annotations.
引用
收藏
页码:79 / 88
页数:10
相关论文
共 28 条
  • [1] Disfluency characteristics of Omani Arabic-English bilingual speakers
    Al'Amri, Fathiya
    Robb, Michael P.
    CLINICAL LINGUISTICS & PHONETICS, 2021, 35 (07) : 593 - 609
  • [2] Writing direction and language activation affect how Arabic-English bilingual speakers map time onto space
    Park, Juana
    Gagne, Christina L.
    Spalding, Thomas L.
    FRONTIERS IN PSYCHOLOGY, 2024, 14
  • [3] Embedded English verbs in Arabic-English code-switching in Egypt
    Kniaz, Malgorzata
    Zawrotna, Magdalena
    INTERNATIONAL JOURNAL OF BILINGUALISM, 2021, 25 (03) : 622 - 639
  • [4] Cross-Corpus Arabic and English Emotion Recognition
    Meftah, Ali
    Seddiq, Yasser
    Alotaibi, Yousef
    Selouani, Sid-Ahmed
    2017 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2017, : 377 - 381
  • [5] Using Deep Learning in Arabic-English Cross Language Information Retrieval
    Attia, Omar
    Azmy, Michael
    Abu Emeira, Ahmed
    El Azzouni, Karim
    Hussein, Omar
    El-Makky, Nagwa M.
    Nagi, Khaled
    KDIR: PROCEEDINGS OF THE 8TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT - VOL. 1, 2016, : 367 - 374
  • [6] A corpus of English learners with Arabic and Hebrew backgrounds
    Abboud, Omaima
    Laufer, Batia
    Ordan, Noam
    Sentsova, Uliana
    Wintner, Shuly
    LANGUAGE RESOURCES AND EVALUATION, 2025, 59 (01) : 591 - 599
  • [7] Arabinglish in multilingual advertising: novel creative and innovative Arabic-English mixing practices in the Jordanian linguistic landscape
    Alomoush, Omar Ibrahim Salameh
    INTERNATIONAL JOURNAL OF MULTILINGUALISM, 2023, 20 (02) : 270 - 289
  • [8] Soft syntactic constraints for Arabic-English hierarchical phrase-based translation
    Marton, Yuval
    Chiang, David
    Resnik, Philip
    MACHINE TRANSLATION, 2012, 26 (1-2) : 137 - 157
  • [9] The WAW Corpus: The First Corpus of Interpreted Speeches and their Translations for English and Arabic
    Abdelali, Ahmed
    Temnikova, Irina
    Hedaya, Samy
    Vogel, Stephan
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 2135 - 2140
  • [10] Is It Justified To Use Arabic In English Class? Efficacy Of English-Arabic Bilingual Teaching For Teaching English At Elementary Level
    Benyo, Ahmed
    Supriyatno, Triyo
    Borah, Anindita
    Kumar, Tribhuwan
    IJAZ ARABI JOURNAL OF ARABIC LEARNING, 2022, 5 (01):