Developing a Thai Grammatical Error Correction Tool for Deaf Students

被引:0
作者
Traitruengsakul, Supachan [1 ]
Chuangsuwanich, Ekapol [1 ]
机构
[1] Chulalongkorn Univ, Fac Engn, Dept Comp Engn, Bangkok 10310, Thailand
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Writing; Auditory system; Sign language; Error correction; Transformers; Social networking (online); Decoding; Data models; Detectors; Deafness; Deaf writing; Thai deaf corpus; grammatical error correction;
D O I
10.1109/ACCESS.2024.3477611
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deaf students face challenges in written communication due to errors such as insertion, deletion, word order issues, misusage, and misspellings. Grammatical error correction (GEC) technology can help mitigate these issues. However, existing GEC models are primarily trained on online resources from second-language learners who are hearing. In contrast, sentences written by deaf students exhibit a variety of errors not typically found elsewhere. To address this, we aimed to create the Thai Deaf Corpus (TDC) from deaf students in grades 7-12 across four deaf schools. Our analysis of the TDC revealed that deaf students wrote short sentences, averaging six words each, used 4,585 unique words, and predominantly produced ungrammatical sentences. In addition, we introduce a two-stage system (Thai-GEC model) to automatically detect and correct incorrect words in ungrammatical sentences. In our experiments, we compared different detection and correction models on the dataset. As a result, off-the-shelf models perform poorly compared to models specifically created using our corpus, showing the usefulness of our dataset. The TDC is available at https://github.com/Supachan/ThaiDeafCorpus.git.
引用
收藏
页码:153980 / 153999
页数:20
相关论文
共 53 条
[1]  
Arreerard R, 2022, LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P6495
[2]   A Survey on Data Augmentation for Text Classification [J].
Bayer, Markus ;
Kaufhold, Marc-Andre ;
Reuter, Christian .
ACM COMPUTING SURVEYS, 2023, 55 (07)
[3]   A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena [J].
Bisazza, Arianna ;
Federico, Marcello .
COMPUTATIONAL LINGUISTICS, 2016, 42 (02) :163-205
[4]   Grammatical Error Correction: A Survey of the State of the Art [J].
Bryant, Christopher ;
Yuan, Zheng ;
Qorib, Muhammad Reza ;
Cao, Hannan ;
Ng, Hwee Tou ;
Briscoe, Ted .
COMPUTATIONAL LINGUISTICS, 2023, 49 (03) :643-701
[5]  
Bryant C, 2019, INNOVATIVE USE OF NLP FOR BUILDING EDUCATIONAL APPLICATIONS, P52
[6]  
Cannon JE, 2013, AM ANN DEAF, V158, P292
[7]  
Chen Stanley F., 1996, P 34 ANN M ASS COMP, V96, P310, DOI [10.3115/981863.981904, DOI 10.3115/981863.981904]
[8]  
Danthanavanich S., 2008, Ph.D. dissertation
[9]  
Dubey A., 2024, arXiv, DOI DOI 10.48550/ARXIV.2407.21783
[10]  
Federmann C, 2010, LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P1731