Classification of L2 Thesis Statement Writing Performance Using Syntactic Complexity Indices

被引:0
作者
Uzun, Kutay [1 ]
机构
[1] Trakya Univ, Dept English Language Teaching, Edirne, Turkey
来源
PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE COMPUTATIONAL LINGUISTICS IN BULGARIA (CLIB '20) | 2020年
关键词
L2 Writing Performance; Machine Learning; Syntactic Complexity; Thesis Statement; Performance Classification; LANGUAGE; PROFICIENCY; QUALITY; WRITERS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This study primarily aimed to find out if machine learning classification algorithms could accurately classify L2 thesis statement writing performance as high or low using syntactic complexity indices. Secondarily, the study aimed to reveal how the syntactic complexity indices from which classification algorithms gained the largest amount of information interacted with L2 thesis statement writing performance. The data set of the study consisted of 137 high-performing and 69 low-performing thesis statements written by undergraduate learners of English in a foreign language context. Experiments revealed that the Locally Weighted Learning algorithm could classify L2 thesis statement writing performance with 75.61% accuracy, 20.01% above the baseline. Balancing the data set via Synthetic Minority Oversampling produced the same accuracy percentage with the Stochastic Gradient Descent algorithm, resulting in a slight increase in Kappa Statistic. In both imbalanced and balanced data sets, it was seen that the number of coordinate phrases, coordinate phrase per t-unit, coordinate phrase per clause and verb phrase per t-unit were the variables from which the classification algorithms gained the largest amount of information Mann-Whitney U tests showed that the high-performing thesis statements had a larger amount of coordinate phrases and higher ratios of coordinate phrase per t-unit and coordinate phrase per clause. The verb phrase per t-unit ratio was seen to be lower in high-performing thesis statements than their low-performing counterparts.
引用
收藏
页码:42 / 52
页数:11
相关论文
共 41 条
[1]  
Ai H., 2013, Automatic treatment and analysis of learner corpus data, P249, DOI DOI 10.1075/SCL.59.15AI
[2]   Applying support vector machines to imbalanced datasets [J].
Akbani, R ;
Kwek, S ;
Japkowicz, N .
MACHINE LEARNING: ECML 2004, PROCEEDINGS, 2004, 3201 :39-50
[3]  
[Anonymous], 2021, The encyclopedia of applied linguistics, DOI DOI 10.1002/9781405198431.WBEAL0858.PUB2
[4]  
[Anonymous], 1996, EAG-TCWG-CTYP/P
[5]   Comparison of classification accuracy using Cohen's Weighted Kappa [J].
Ben-David, Arie .
EXPERT SYSTEMS WITH APPLICATIONS, 2008, 34 (02) :825-832
[6]  
BIBER D, 2016, APPL LINGUIST, V37, P639, DOI DOI 10.1093/APPLIN/AMU059
[7]  
Briscoe T., 2006, INTRO TAG SEQUENCE G
[8]   Measuring Students' Writing Ability on a Computer-Analytic Developmental Scale: An Exploratory Validity Study [J].
Burdick, Hal ;
Swartz, Carl W. ;
Stenner, A. Jackson ;
Fitzgerald, Jill ;
Burdick, Don ;
Hanlon, Sean T. .
LITERACY RESEARCH AND INSTRUCTION, 2013, 52 (04) :255-280
[9]  
Burnard L., 2005, DEV LINGUISTIC CORPO
[10]  
Burstein J, 2001, 39TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P90