CTAP for Chinese: A linguistic Complexity Feature Automatic Calculation Platform

被引:0
作者
Cui, Yue [1 ,2 ,3 ]
Zhu, Junhui [1 ,2 ,3 ]
Yang, Liner [1 ,2 ,3 ]
Fang, Xuezhi [1 ,2 ,3 ]
Chen, Xiaobin [4 ]
Wang, Yujie [5 ]
Yang, Erhong [1 ,3 ]
机构
[1] Beijing Language & Culture Univ, Natl Language Resources Monitoring & Res Ctr, Print Media Language Branch, Beijing 100083, Peoples R China
[2] Beijing Language & Culture Univ, Sch Informat Sci, Beijing 100083, Peoples R China
[3] Beijing Language & Culture Univ, Beijing Adv Innovat Ctr Language Resources, Beijing 100083, Peoples R China
[4] Tubingen Univ, Hector Inst Empir Bildungsforsch, D-72072 Tubingen, Germany
[5] Beijing Jiaotong Univ, Sch Comp & Informat Technol, Beijing 100044, Peoples R China
来源
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2022年
关键词
linguistic complexity analyzer; Chinese; text analysis; SYNTACTIC COMPLEXITY; READABILITY; LEVEL;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The construct of linguistic complexity has been widely used in the research of language learning. Several text analysis tools have been made to automatically analyze linguistic complexity. However, the indexes supported by several existing Chinese text analysis tools are limited and varied due to different research purposes. CTAP is an open-source toolkit for linguistic complexity measurement extraction, which serves all research purposes. Although it was originally developed for English, the Unstructured Information Management (UIMA) framework it used allows the integration of other languages. In this study, we integrated the Chinese component into CTAP, describing the index sets it incorporated and comparing it with three linguistic complexity tools for Chinese. The index set includes 4 levels of 196 linguistic complexity indexes: character level, word level, sentence level, and discourse level. So far, CTAP has implemented automatic calculation of complexity characteristics for four languages, aiming to help linguists without NLP background do their research on language complexity.
引用
收藏
页码:5525 / 5538
页数:14
相关论文
共 47 条
  • [1] Conceptualizing and measuring short-term changes in L2 writing complexity
    Bulte, Bram
    Housen, Alex
    [J]. JOURNAL OF SECOND LANGUAGE WRITING, 2014, 26 : 42 - 65
  • [2] Cai J., 2020, THESIS
  • [3] Carpenter P. A., 1983, EYE MOVEMENTS READIN, P275, DOI DOI 10.1016/B978-0-12-583680
  • [4] Chang L.P., 2012, J CHINESE LANGUAGE T, V9, P77, DOI [https://doi.org/10.6393/JCLT.201206.0077, DOI 10.6393/JCLT.201206.0077]
  • [5] Patient Flow Scheduling and Capacity Planning in a Smart Hospital Environment
    Chen, Xiao
    Wang, Liangmin
    Ding, Jie
    Thomas, Nigel
    [J]. IEEE ACCESS, 2016, 4 : 135 - 148
  • [6] Word frequency and readability: Predicting the text-level readability with a lexical-level attribute
    Chen, Xiaobin
    Meurers, Detmar
    [J]. JOURNAL OF RESEARCH IN READING, 2018, 41 (03) : 486 - 510
  • [7] CLEC, 2008, INT CURR CHIN LANG E
  • [8] Does writing development equal writing quality? A computational investigation of syntactic complexity in L2 learners
    Crossley, Scott A.
    McNamara, Danielle S.
    [J]. JOURNAL OF SECOND LANGUAGE WRITING, 2014, 26 : 66 - 79
  • [9] The Development of Writing Proficiency as a Function of Grade Level: A Linguistic Analysis
    Crossley, Scott A.
    Weston, Jennifer L.
    Sullivan, Susan T. McLain
    McNamara, Danielle S.
    [J]. WRITTEN COMMUNICATION, 2011, 28 (03) : 282 - 311
  • [10] Deng Y., 2013, J FOREIGN LANGUAGES, P31