Deep Learning Based Part-of-Speech Tagging for Malayalam Twitter Data (Special Issue: Deep Learning Techniques for Natural Language Processing)

被引:14
|
作者
Kumar, S. [1 ]
Kumar, M. Anand [1 ]
Soman, K. P. [1 ]
机构
[1] Amrita Vishwa Vidyapeetham, Amrita Sch Engn, Ctr Computat Engn & Networking CEN, Coimbatore, Tamil Nadu, India
关键词
Part-of-speech tagging; deep learning; recurrent neural network; long short-term memory; gated recurrent unit; bidirectional LSTM;
D O I
10.1515/jisys-2017-0520
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The paper addresses the problem of part-of-speech (POS) tagging for Malayalam tweets. The conversational style of posts/tweets/text in social media data poses a challenge in using general POS tagset for tagging the text. For the current work, a tagset was designed that contains 17 coarse tags and 9915 tweets were tagged manually for experiment and evaluation. The tagged data were evaluated using sequential deep learning methods like recurrent neural network (RNN), gated recurrent units (GRU), long short-term memory (LSTM), and bidirectional LSTM (BLSTM). The training of the model was performed on the tagged tweets, at word level and character level. The experiments were evaluated using measures like precision, recall, f1-measure, and accuracy. During the experiment, it was found that the GRU-based deep learning sequential model at word level gave the highest f1-measure of 0.9254; at character-level, the BLSTM-based deep learning sequential model gave the highest f1-measure of 0.8739. To choose the suitable number of hidden states, we varied it as 4, 16, 32, and 64, and performed training for each. It was observed that the increase in hidden states improved the tagger model. This is an initial work to perform Malayalam Twitter data POS tagging using deep learning sequential models.
引用
收藏
页码:423 / 435
页数:13
相关论文
共 50 条
  • [21] Introduction to the Special Section on Deep Learning for Speech and Language Processing
    Yu, Dong
    Hinton, Geoffrey
    Morgan, Nelson
    Chien, Jen-Tzung
    Sagayama, Shigeki
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01): : 4 - 6
  • [22] Novel Text Steganography Using Natural Language Processing and Part-of-Speech Tagging
    Banik, Barnali Gupta
    Bandyopadhyay, Samir Kumar
    IETE JOURNAL OF RESEARCH, 2020, 66 (03) : 384 - 395
  • [23] Improving part-of-speech tagging in Amharic language using deep neural network
    Hirpassa, Sintayehu
    Lehal, G. S.
    HELIYON, 2023, 9 (07)
  • [24] Part-of-speech Tagging Based on Dictionary and Statistical Machine Learning
    Ye Zhonglin
    Jia Zhen
    Huang Junfu
    Yin Hongfeng
    PROCEEDINGS OF THE 35TH CHINESE CONTROL CONFERENCE 2016, 2016, : 6993 - 6998
  • [25] Comparative Analysis of Deep Learning Models for Part of Speech Tagging in the Malay Language
    Adebayo B.M.
    Anbananthen K.S.M.
    Muthaiyah S.
    Lurudusamy S.N.
    HighTech and Innovation Journal, 2024, 5 (02): : 272 - 281
  • [26] Part-of-Speech (POS) Tagging Using Deep Learning-Based Approaches on the Designed Khasi POS Corpus
    Warjri, Sunita
    Pakray, Partha
    Lyngdoh, Saralin A.
    Maji, Arnab Kumar
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (03)
  • [27] Deep learning based spell checker for Malayalam language
    Sooraj, S.
    Manjusha, K.
    Kumar, M. Anand
    Soman, K. P.
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 34 (03) : 1427 - 1434
  • [28] Deep learning of the natural language processing
    Allauzen, Alexandre
    Schuetze, Hinrich
    TRAITEMENT AUTOMATIQUE DES LANGUES, 2018, 59 (02): : 7 - 14
  • [29] Deep Learning in Natural Language Processing
    Feng, Haoda
    Shi, Feng
    NATURAL LANGUAGE ENGINEERING, 2021, 27 (03) : 373 - 375
  • [30] Natural language processing in support of decision-making: phrases and part-of-speech tagging
    Losee, RM
    INFORMATION PROCESSING & MANAGEMENT, 2001, 37 (06) : 769 - 787