Part-of-Speech Tagger for Malay Social Media Texts

被引:7
|
作者
Ariffin, Siti Noor Allia Noor [1 ]
Tiun, Sabrina [1 ]
机构
[1] Univ Kebangsaan Malaysia, Fac Informat Sci & Technol, Bangi, Malaysia
来源
GEMA ONLINE JOURNAL OF LANGUAGE STUDIES | 2018年 / 18卷 / 04期
关键词
part-of-speech; informal Malay text; Malay POS tagger; Malay tweet; QTAG;
D O I
10.17576/gema-2018-1804-09
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
Processing the meaning of words in social media texts, such as tweets, is challenging in natural language processing. Malay tweets are no exception because they demonstrate distinct linguistic phenomena, such as the use of dialects from each state in Malaysia; borrowing foreign language terms in the context of Malay language; and using mixed languages, abbreviations and spelling errors or mistakes in sentence structure. Tagging the word class of tweets is an arduous task because tweets are characterised by their distinctive style, linguistic sounds and errors. Currently, existing works on Malay part-of-speech (POS) are based only on standard Malay and formal texts and are thus unsuitable for tagging tweet texts. Thus, a POS model of tweet tagging for non-standardised Malay language must be developed. This study aims to design and implement a non-standardised Malay POS model for tweets and performs assessment on the basis of the word tagging accuracy of test data of unnormalised and normalised tweet texts. A solution that adopts a probabilistic POS tagging called QTAG is proposed. Results show that the Malay QTAG achieves best average POS tagging accuracies of 90% and 88.8% for normalised and unnormalised test datasets, respectively.
引用
收藏
页码:124 / 142
页数:19
相关论文
共 50 条
  • [21] Quantification of part-of-speech relationships for aspect sentiment triplet extraction
    Wang, Jiacan
    Liu, Jianhua
    Ke, Tianci
    Chen, Kewei
    Cai, Zijie
    Xu, Ge
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2025,
  • [22] A Comparison of Different Part-of-Speech Tagging Technique for Text in Bahasa Indonesia
    Zuli, Ahmad
    Hartanto, Amrullah Rudy
    Mustika, I. Wayan
    2017 7TH INTERNATIONAL ANNUAL ENGINEERING SEMINAR (INAES), 2017, : 6 - 10
  • [23] Chinese Chunking Based on Coarse-grained Part-of-Speech Features
    Sun, Guang-Lu
    Xue, Yibo
    Xu, Zhiming
    Lang, Fei
    2009 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2009, : 226 - 229
  • [24] Part-of-Speech Tags Guide Low-Resource Machine Translation
    Kadeer, Zaokere
    Yi, Nian
    Wumaier, Aishan
    ELECTRONICS, 2023, 12 (16)
  • [25] Automatic Clustering of Part-of-speech for Vocabulary Divided PLSA Language Model
    Suzuki, Motoyuki
    Kuriyama, Naoto
    Ito, Akinori
    Makino, Shozo
    IEEE NLP-KE 2008: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2008, : 289 - +
  • [26] Sentiment Classification Based on Part-of-Speech and Self-Attention Mechanism
    Cheng, Kefei
    Yue, Yanan
    Song, Zhiwen
    IEEE ACCESS, 2020, 8 : 16387 - 16396
  • [27] SP-BTM: A Specific Part-of-speech BTM for Service Clustering
    Hu, Rong
    Liu, Jianxun
    Wen, Yiping
    2020 IEEE INTL SYMP ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, INTL CONF ON BIG DATA & CLOUD COMPUTING, INTL SYMP SOCIAL COMPUTING & NETWORKING, INTL CONF ON SUSTAINABLE COMPUTING & COMMUNICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2020), 2020, : 1050 - 1057
  • [28] Improving word vector model with part-of-speech and dependency grammar information
    Deng, Chunhui
    Lai, Gangming
    Deng, Huifang
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2020, 5 (04) : 276 - 282
  • [29] Automatic Genre Classification via N-grams of Part-of-Speech Tags
    Tang, Xiaoyan
    Cao, Jing
    CURRENT WORK IN CORPUS LINGUISTICS: WORKING WITH TRADITIONALLY- CONCEIVED CORPORA AND BEYOND (CILC2015), 2015, 198 : 474 - 478
  • [30] Enhancing context representations with part-of-speech information and neighboring signals for question classification
    Peizhu Gong
    Jin Liu
    Yurong Xie
    Minjie Liu
    Xiliang Zhang
    Complex & Intelligent Systems, 2023, 9 : 6191 - 6209