Prompt-based for Low-Resource Tibetan Text Classification

被引：4

作者：

An, Bo ^{[1
]}

机构：

[1] Chinese Acad Social Sci, Inst Ethnol & Anthropol, South Tweenty 7 St,Bldg 6,Zhongguancun Nandajie 2, Beijing, Beijing, Peoples R China

来源：

ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING | 2023年 / 22卷 / 08期

基金：

中国国家自然科学基金;

关键词：

Tibetan text classification; prompt learning; deep learning; pre-trained language model;

D O I：

10.1145/3603168

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Text classification is a critical and foundational task in Tibetan natural language processing, it plays a crucial role in various applications, such as sentiment analysis and information extraction. However, the limited availability of annotated data poses a significant challenge to Tibetan natural language processing. This paper proposes a prompt learning-based method for low-resource Tibetan text classification to overcome this challenge. This method utilizes pre-trained language models to learn text representation and generation capabilities on a large-scale unsupervised Tibetan corpus, enabling few-shot Tibetan text classification. Experimental results demonstrate that the proposed method significantly improves the performance of Tibetan text classification in low-resource scenarios. This work provides a new research idea and method for low-resource language processing, such as Tibetan natural language processing. Hopefully, it will inspire subsequent work on low-resource language processing.

引用

页数：13

共 46 条

[1] An Bo, 2022, Journal of Chinese Information Processing
[2] [Anonymous], 2012, 24 INT C COMP LING
[3] A Survey on Aspect-Based Sentiment Classification
Brauwers, Gianni
Frasincar, Flavius
[J]. ACM COMPUTING SURVEYS, 2023, 55 (04)
[4] Cai JJ, 2018, I COMP CONF WAVELET, P123, DOI 10.1109/ICCWAMTIP.2018.8632592
[5] Tibetan Text Classification Based on the Feature of Position Weight
Cao, Hui
Jia, Huiqiang
[J]. 2013 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2013), 2013, : 220 - 223
[6] Emerging Trends Word2Vec
Church, Kenneth Ward
[J]. NATURAL LANGUAGE ENGINEERING, 2017, 23 (01) : 155 - 162
[7] Pre-Training With Whole Word Masking for Chinese BERT
Cui, Yiming
Che, Wanxiang
Liu, Ting
Qin, Bing
Yang, Ziqing
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3504 - 3514
[8] Plant recolonization in the Himalaya from the southeastern Qinghai-Tibetan Plateau: Geographical isolation contributed to high population differentiation
Cun, Yu-Zhi
Wang, Xiao-Quan
[J]. MOLECULAR PHYLOGENETICS AND EVOLUTION, 2010, 56 (03) : 972 - 982
[9] Grave E, 2018, Arxiv, DOI [arXiv:1802.06893, DOI 10.48550/ARXIV.1802.06893, 10.48550/arxiv.1802.06893]
[10] Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]

← 1 2 3 4 5 →