Part-of-Speech Tagger for Malay Social Media Texts

被引:7
作者
Ariffin, Siti Noor Allia Noor [1 ]
Tiun, Sabrina [1 ]
机构
[1] Univ Kebangsaan Malaysia, Fac Informat Sci & Technol, Bangi, Malaysia
来源
GEMA ONLINE JOURNAL OF LANGUAGE STUDIES | 2018年 / 18卷 / 04期
关键词
part-of-speech; informal Malay text; Malay POS tagger; Malay tweet; QTAG;
D O I
10.17576/gema-2018-1804-09
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
Processing the meaning of words in social media texts, such as tweets, is challenging in natural language processing. Malay tweets are no exception because they demonstrate distinct linguistic phenomena, such as the use of dialects from each state in Malaysia; borrowing foreign language terms in the context of Malay language; and using mixed languages, abbreviations and spelling errors or mistakes in sentence structure. Tagging the word class of tweets is an arduous task because tweets are characterised by their distinctive style, linguistic sounds and errors. Currently, existing works on Malay part-of-speech (POS) are based only on standard Malay and formal texts and are thus unsuitable for tagging tweet texts. Thus, a POS model of tweet tagging for non-standardised Malay language must be developed. This study aims to design and implement a non-standardised Malay POS model for tweets and performs assessment on the basis of the word tagging accuracy of test data of unnormalised and normalised tweet texts. A solution that adopts a probabilistic POS tagging called QTAG is proposed. Results show that the Malay QTAG achieves best average POS tagging accuracies of 90% and 88.8% for normalised and unnormalised test datasets, respectively.
引用
收藏
页码:124 / 142
页数:19
相关论文
共 50 条
  • [31] On development of multimodal named entity recognition using part-of-speech and mixture of experts
    Jianying Chen
    Yun Xue
    Haolan Zhang
    Weiping Ding
    Zhengxuan Zhang
    Jiehai Chen
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 2181 - 2192
  • [32] Automatic Genre Classification via N-grams of Part-of-Speech Tags
    Tang, Xiaoyan
    Cao, Jing
    CURRENT WORK IN CORPUS LINGUISTICS: WORKING WITH TRADITIONALLY- CONCEIVED CORPORA AND BEYOND (CILC2015), 2015, 198 : 474 - 478
  • [33] Enhancing context representations with part-of-speech information and neighboring signals for question classification
    Peizhu Gong
    Jin Liu
    Yurong Xie
    Minjie Liu
    Xiliang Zhang
    Complex & Intelligent Systems, 2023, 9 : 6191 - 6209
  • [34] On development of multimodal named entity recognition using part-of-speech and mixture of experts
    Chen, Jianying
    Xue, Yun
    Zhang, Haolan
    Ding, Weiping
    Zhang, Zhengxuan
    Chen, Jiehai
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (06) : 2181 - 2192
  • [35] Enhancing context representations with part-of-speech information and neighboring signals for question classification
    Gong, Peizhu
    Liu, Jin
    Xie, Yurong
    Liu, Minjie
    Zhang, Xiliang
    COMPLEX & INTELLIGENT SYSTEMS, 2023, 9 (06) : 6191 - 6209
  • [36] Combining Part-of-Speech Tags and Self-Attention Mechanism for Simile Recognition
    Zhang, Pengfei
    Cai, Yi
    Chen, Junying
    Chen, Wenhao
    Song, Hengjie
    IEEE ACCESS, 2019, 7 : 163864 - 163876
  • [37] Sentence-based Plagiarism Detection focusing on Nouns and Part-of-Speech Structure
    Yokoi, Takeru
    Oikawa, Gouki
    Iwata, Mitsuru
    Sato, Takashi
    Kobayakawa, Michihiro
    NEW TRENDS IN SOFTWARE METHODOLOGIES, TOOLS AND TECHNIQUES, 2014, 265 : 1006 - 1015
  • [38] Part-of-speech based label update network for aspect sentiment triplet extraction
    Li, Yanbo
    He, Qing
    Yang, Liu
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2024, 36 (01)
  • [39] Part-of-speech in a node-link scoring techniques for assessing learners' understanding
    Lajis, Adidah Binti
    Aziz, Normaziah Abdul
    COMPUTATIONAL LINGUISTICS AND RELATED FIELDS, 2011, 27 : 131 - 139
  • [40] N-gram Adaptation Using Dirichlet Class Language Model Based on Part-of-Speech for Speech Recognition
    Hatami, Ali
    Akbari, Ahmad
    Nasersharif, Babak
    2013 21ST IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2013,