Part-of-Speech Tagger for Malay Social Media Texts

被引:7
作者
Ariffin, Siti Noor Allia Noor [1 ]
Tiun, Sabrina [1 ]
机构
[1] Univ Kebangsaan Malaysia, Fac Informat Sci & Technol, Bangi, Malaysia
来源
GEMA ONLINE JOURNAL OF LANGUAGE STUDIES | 2018年 / 18卷 / 04期
关键词
part-of-speech; informal Malay text; Malay POS tagger; Malay tweet; QTAG;
D O I
10.17576/gema-2018-1804-09
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
Processing the meaning of words in social media texts, such as tweets, is challenging in natural language processing. Malay tweets are no exception because they demonstrate distinct linguistic phenomena, such as the use of dialects from each state in Malaysia; borrowing foreign language terms in the context of Malay language; and using mixed languages, abbreviations and spelling errors or mistakes in sentence structure. Tagging the word class of tweets is an arduous task because tweets are characterised by their distinctive style, linguistic sounds and errors. Currently, existing works on Malay part-of-speech (POS) are based only on standard Malay and formal texts and are thus unsuitable for tagging tweet texts. Thus, a POS model of tweet tagging for non-standardised Malay language must be developed. This study aims to design and implement a non-standardised Malay POS model for tweets and performs assessment on the basis of the word tagging accuracy of test data of unnormalised and normalised tweet texts. A solution that adopts a probabilistic POS tagging called QTAG is proposed. Results show that the Malay QTAG achieves best average POS tagging accuracies of 90% and 88.8% for normalised and unnormalised test datasets, respectively.
引用
收藏
页码:124 / 142
页数:19
相关论文
共 50 条
  • [41] Image Captioning Model Using Part-of-Speech Guidance Module for Description With Diverse Vocabulary
    Bae, Ju-Won
    Lee, Soo-Hwan
    Kim, Won-Yeol
    Seong, Ju-Hyeon
    Seo, Dong-Hoan
    [J]. IEEE ACCESS, 2022, 10 : 45219 - 45229
  • [42] Corpora with Part-of-Speech Annotations for Three Regional Languages of France: Alsatian, Occitan and Picard
    Bernhard, Delphine
    Ligozat, Anne-Laure
    Martin, Fanny
    Bras, Myriam
    Magistry, Pierre
    Vergez-Couret, Marianne
    Steible, Lucie
    Erhart, Pascale
    Hathout, Nabil
    Huck, Dominique
    Rey, Christophe
    Reynes, Philippe
    Rosset, Sophie
    Sibille, Jean
    Lavergne, Thomas
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 3917 - 3924
  • [43] A Korean part-of-speech tagging system using resolution rules for individual ambiguous word
    Ahn, Young-Min
    Shin, Seung-Eun
    Park, Hee-Geun
    Ji, Hyungsuk
    Seo, Young-Hoon
    [J]. COMPUTATIONAL SCIENCE - ICCS 2007, PT 2, PROCEEDINGS, 2007, 4488 : 1222 - +
  • [44] Comparative Study between Part-of-Speech and Statistical Methods of Text Extraction in the Tourism Domain
    Kuntarto, Guson P.
    Moechtar, Fahmi L.
    Santoso, Berkah I.
    Gunawan, Irwan P.
    [J]. 2015 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY SYSTEMS AND INNOVATION (ICITSI), 2015,
  • [45] Language Modeling Using Part-of-speech and Long Short-Term Memory Networks
    Norouzi, Sanaz Saki
    Akbari, Ahmad
    Nasersharif, Babak
    [J]. 2019 9TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE 2019), 2019, : 182 - 187
  • [46] Weighted Combination of Q&A Retrieval Models Based on Part-of-speech of Question Word
    Chang, Doo Soo
    Choi, Yong Suk
    [J]. 2014 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND APPLICATIONS (ICISA), 2014,
  • [47] Comparative Analysis of Deep Learning Models for Part of Speech Tagging in the Malay Language
    Adebayo B.M.
    Anbananthen K.S.M.
    Muthaiyah S.
    Lurudusamy S.N.
    [J]. HighTech and Innovation Journal, 2024, 5 (02): : 272 - 281
  • [48] Leveraging Lexical Resources and Constraint Grammar for Rule-Based Part-of-Speech Tagging in Welsh
    Neale, Steven
    Donnelly, Kevin
    Watkins, Gareth
    Knight, Dawn
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 3946 - 3954
  • [49] The Construction of Sentiment Lexicon Based on Context-Dependent Part-of-Speech Chunks for Semantic Disambiguation
    Yin, Fulian
    Wang, Yanyan
    Liu, Jianbo
    Lin, Lisha
    [J]. IEEE ACCESS, 2020, 8 (08): : 63359 - 63367
  • [50] POS-RS: A Random Subspace method for sentiment classification based on part-of-speech analysis
    Wang, Gang
    Zhang, Zhu
    Sun, Jianshan
    Yang, Shanlin
    Larson, Catherine A.
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2015, 51 (04) : 458 - 479