Towards Better Text Processing Tools for the Ainu Language

被引:0
|
作者
Nowakowski, Karol [1 ]
Ptaszynski, Michal [1 ]
Masui, Fumito [1 ]
机构
[1] Kitami Inst Technol, Dept Comp Sci, 165 Koen Cho, Kitami, Hokkaido 0908507, Japan
来源
HUMAN LANGUAGE TECHNOLOGY. CHALLENGES FOR COMPUTER SCIENCE AND LINGUISTICS, LTC 2017 | 2020年 / 12598卷
关键词
Ainu language; Endangered languages; Under-resourced languages; Transcription normalization; Word segmentation; Tokenization; Part-of-speech tagging;
D O I
10.1007/978-3-030-66527-2_10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we present our research devoted to the development of Natural Language Processing technologies for the Ainu language, a critically endangered language isolate spoken by the Ainu people, the native inhabitants of northern parts of the Japanese archipelago. In particular, we focused on improving the existing tools for transcription normalization, word segmentation (tokenization) and part-of-speech tagging. In the experiments we applied two Ainu language dictionaries from different domains (literary and colloquial) and created a new data set by combining them. The experiments confirmed the positive effect of these modifications on the overall performance of the tools, especially with objective samples unrelated to the training data. We also discuss further improvements obtained by applying corpus-driven language models to the problem of word segmentation and using a state-of-the-art tool for training part-of-speech taggers.
引用
收藏
页码:131 / 145
页数:15
相关论文
共 50 条
  • [31] Natural language processing for under-resourced languages: Developing a Welsh natural language toolkit
    Cunliffe, Daniel
    Vlachidis, Andreas
    Williams, Daniel
    Tudhope, Douglas
    COMPUTER SPEECH AND LANGUAGE, 2022, 72
  • [32] Language-Independent Text-Line Extraction Algorithm for Handwritten Documents
    Ryu, Jewoong
    Koo, Hyung Il
    Cho, Nam Ik
    IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (09) : 1115 - 1119
  • [33] Moroccans' Attitudes towards Amazigh Language Teaching: Patterns and Perspectives
    Idhssaine, Abdellah
    APPLIED LINGUISTICS RESEARCH JOURNAL, 2020, 4 (03): : 52 - 64
  • [34] Emotional and Functional Speaker Attitudes towards Gagauz as an Endangered Language
    Kirmizi, Gulin Dagdeviren
    BILIG, 2020, (93) : 203 - 222
  • [35] Shirorekha Extraction in Character Segmentation For Printed Devanagri Text In Document Image Processing
    Shinde, Ambadas B.
    Dandawate, Yogesh H.
    2014 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2014,
  • [36] An Approach for Generating SQL Query Using Natural Language Processing
    More, Priyanka
    Kudale, Bharti
    Deshmukh, Pranali
    Biswas, Indira N.
    More, Neha J.
    Gomes, Francisco S.
    INTELLIGENT COMMUNICATION TECHNOLOGIES AND VIRTUAL MOBILE NETWORKS, ICICV 2019, 2020, 33 : 226 - 230
  • [37] Offline segmentation and online language processing units The influence of literacy
    Veldhuis, Dorina
    Kurvers, Jeanne
    WRITTEN LANGUAGE AND LITERACY, 2012, 15 (02) : 165 - 184
  • [38] BENCHMARKING HIGH PERFORMANCE ARCHITECTURES WITH NATURAL LANGUAGE PROCESSING ALGORITHMS
    Kuta, Marcin
    Kitowski, Jacek
    COMPUTER SCIENCE-AGH, 2011, 12 : 19 - 31
  • [39] Chinese Natural Language Processing based on Semantic Structure Tree
    Yin, Qi-jin
    Wang, Shao-ping
    Miao, Yi-nan
    Dou, Xin
    2015 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND APPLICATIONS (CSA), 2015, : 130 - 134
  • [40] Segmenting Words in Thai Language Using Minimum Text Units and Conditional Random Field
    Paripremkul, Kannikar
    Sornil, Ohm
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2021, 12 (02) : 135 - 141