Part-of-Speech Tagger for Malay Social Media Texts

被引:7
|
作者
Ariffin, Siti Noor Allia Noor [1 ]
Tiun, Sabrina [1 ]
机构
[1] Univ Kebangsaan Malaysia, Fac Informat Sci & Technol, Bangi, Malaysia
来源
GEMA ONLINE JOURNAL OF LANGUAGE STUDIES | 2018年 / 18卷 / 04期
关键词
part-of-speech; informal Malay text; Malay POS tagger; Malay tweet; QTAG;
D O I
10.17576/gema-2018-1804-09
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
Processing the meaning of words in social media texts, such as tweets, is challenging in natural language processing. Malay tweets are no exception because they demonstrate distinct linguistic phenomena, such as the use of dialects from each state in Malaysia; borrowing foreign language terms in the context of Malay language; and using mixed languages, abbreviations and spelling errors or mistakes in sentence structure. Tagging the word class of tweets is an arduous task because tweets are characterised by their distinctive style, linguistic sounds and errors. Currently, existing works on Malay part-of-speech (POS) are based only on standard Malay and formal texts and are thus unsuitable for tagging tweet texts. Thus, a POS model of tweet tagging for non-standardised Malay language must be developed. This study aims to design and implement a non-standardised Malay POS model for tweets and performs assessment on the basis of the word tagging accuracy of test data of unnormalised and normalised tweet texts. A solution that adopts a probabilistic POS tagging called QTAG is proposed. Results show that the Malay QTAG achieves best average POS tagging accuracies of 90% and 88.8% for normalised and unnormalised test datasets, respectively.
引用
收藏
页码:124 / 142
页数:19
相关论文
共 50 条
  • [1] An auxiliary Part-of-Speech tagger for blog and microblog cyber-slang
    Golia, Silvia
    Zola, Paola
    STATISTICAL ANALYSIS AND DATA MINING, 2023, 16 (01) : 65 - 79
  • [2] An open source part-of-speech tagger for Norwegian: Building on existing language resources
    Marco, Cristina S.
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 4111 - 4117
  • [3] An efficient part-of-speech tagger rule-based approach of Sanskrit language analysis
    Tapaswi N.
    International Journal of Information Technology, 2024, 16 (2) : 901 - 908
  • [4] Part-of-speech persistence: The influence of part-of-speech information on lexical processes
    Melinger, Alissa
    Koenig, Jean-Pierre
    JOURNAL OF MEMORY AND LANGUAGE, 2007, 56 (04) : 472 - 489
  • [5] Rule-based Text Normalization for Malay Social Media Texts
    Ariffin, Siti Noor Allia Noor
    Tiun, Sabrina
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (10) : 156 - 162
  • [6] Toward An Efficient Arabic Part of Speech Tagger
    Abdelali, Ahmed
    Elhadj, Yahya O. Mohamed
    Bouziane, Rachid
    2013 ACS INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2013,
  • [7] Part-Of-Speech Labeling for Reuters Database
    Cretulescu, R.
    David, A.
    Morariu, D.
    Vintan, L.
    2015 19TH INTERNATIONAL CONFERENCE ON SYSTEM THEORY, CONTROL AND COMPUTING (ICSTCC), 2015, : 117 - 122
  • [8] Justifying part-of-speech assignments for Mandarin gei
    Her, One-Soon
    LINGUA, 2006, 116 (08) : 1274 - 1302
  • [9] Part-of-speech tagging using genetic algorithms
    Department of Computer Science and Engineering, Lovely Professional University, Jalandhar
    Punjab, India
    Int. J. Simul. Syst. Sci. Technol., 6 (11.1-11.7): : 11.1 - 11.7
  • [10] Question Type Classification Using a Part-of-Speech Hierarchy
    Khoury, Richard
    AUTONOMOUS AND INTELLIGENT SYSTEMS, 2011, 6752 : 212 - 221