Deep Learning Based Part-of-Speech Tagging for Malayalam Twitter Data (Special Issue: Deep Learning Techniques for Natural Language Processing)

被引:14
|
作者
Kumar, S. [1 ]
Kumar, M. Anand [1 ]
Soman, K. P. [1 ]
机构
[1] Amrita Vishwa Vidyapeetham, Amrita Sch Engn, Ctr Computat Engn & Networking CEN, Coimbatore, Tamil Nadu, India
关键词
Part-of-speech tagging; deep learning; recurrent neural network; long short-term memory; gated recurrent unit; bidirectional LSTM;
D O I
10.1515/jisys-2017-0520
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The paper addresses the problem of part-of-speech (POS) tagging for Malayalam tweets. The conversational style of posts/tweets/text in social media data poses a challenge in using general POS tagset for tagging the text. For the current work, a tagset was designed that contains 17 coarse tags and 9915 tweets were tagged manually for experiment and evaluation. The tagged data were evaluated using sequential deep learning methods like recurrent neural network (RNN), gated recurrent units (GRU), long short-term memory (LSTM), and bidirectional LSTM (BLSTM). The training of the model was performed on the tagged tweets, at word level and character level. The experiments were evaluated using measures like precision, recall, f1-measure, and accuracy. During the experiment, it was found that the GRU-based deep learning sequential model at word level gave the highest f1-measure of 0.9254; at character-level, the BLSTM-based deep learning sequential model gave the highest f1-measure of 0.8739. To choose the suitable number of hidden states, we varied it as 4, 16, 32, and 64, and performed training for each. It was observed that the increase in hidden states improved the tagger model. This is an initial work to perform Malayalam Twitter data POS tagging using deep learning sequential models.
引用
收藏
页码:423 / 435
页数:13
相关论文
共 50 条
  • [1] A Deep Learning Approach for Part-of-Speech Tagging in Nepali Language
    Prabha, Greeshma
    Jyothsna, P., V
    Shahina, K. K.
    Premjith, B.
    Soman, K. P.
    2018 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2018, : 1132 - 1136
  • [2] Part-of-Speech Tagging of Odia Language Using Statistical and Deep Learning Based Approaches
    Dalai, Tusarkanta
    Mishra, Tapas Kumar
    Sa, Pankaj K.
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (06)
  • [3] Deep Learning Model for Tamil Part-of-Speech Tagging
    Visuwalingam, Hemakasiny
    Sakuntharaj, Ratnasingam
    Alawatugoda, Janaka
    Ragel, Roshan
    COMPUTER JOURNAL, 2024, 67 (08): : 2633 - 2642
  • [4] Special issue on deep learning for natural language processing
    Wei, Wei
    Wu, Jinsong
    Zhu, Chunsheng
    COMPUTING, 2020, 102 (03) : 601 - 603
  • [5] Special issue on deep learning for natural language processing
    Wei Wei
    Jinsong Wu
    Chunsheng Zhu
    Computing, 2020, 102 : 601 - 603
  • [6] Parts-of-Speech tagging for Malayalam using deep learning techniques
    Akhil K.K.
    Rajimol R.
    Anoop V.S.
    International Journal of Information Technology, 2020, 12 (3) : 741 - 748
  • [7] Using machine learning techniques for part-of-speech tagging in the Greek language
    Petasis, G
    Paliouras, G
    Karkaletsis, V
    Spyropoulos, CD
    Androutsopoulos, I
    ADVANCES IN INFORMATICS, 2000, : 273 - 281
  • [8] Special Issue on Deep Structured Learning for Natural Language Processing
    Manogaran, Gunasekaran
    Qudrat-Ullah, Hassan
    Xin, Qin
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (01)
  • [9] Introduction to the Special Issue on Deep Structured Learning for Natural Language Processing, Part 3
    Manogaran, Gunasekaran
    Qudrat-Ullah, Hassan
    Xin, Qin
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (05)
  • [10] Deep Learning Architecture for Part-of-Speech Tagging with Word and Suffix Embeddings
    Popov, Alexander
    ARTIFICIAL INTELLIGENCE: METHODOLOGY, SYSTEMS, AND APPLICATIONS, AIMSA 2016, 2016, 9883 : 68 - 77