Deep Learning Based Part-of-Speech Tagging for Malayalam Twitter Data (Special Issue: Deep Learning Techniques for Natural Language Processing)

被引:14
|
作者
Kumar, S. [1 ]
Kumar, M. Anand [1 ]
Soman, K. P. [1 ]
机构
[1] Amrita Vishwa Vidyapeetham, Amrita Sch Engn, Ctr Computat Engn & Networking CEN, Coimbatore, Tamil Nadu, India
关键词
Part-of-speech tagging; deep learning; recurrent neural network; long short-term memory; gated recurrent unit; bidirectional LSTM;
D O I
10.1515/jisys-2017-0520
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The paper addresses the problem of part-of-speech (POS) tagging for Malayalam tweets. The conversational style of posts/tweets/text in social media data poses a challenge in using general POS tagset for tagging the text. For the current work, a tagset was designed that contains 17 coarse tags and 9915 tweets were tagged manually for experiment and evaluation. The tagged data were evaluated using sequential deep learning methods like recurrent neural network (RNN), gated recurrent units (GRU), long short-term memory (LSTM), and bidirectional LSTM (BLSTM). The training of the model was performed on the tagged tweets, at word level and character level. The experiments were evaluated using measures like precision, recall, f1-measure, and accuracy. During the experiment, it was found that the GRU-based deep learning sequential model at word level gave the highest f1-measure of 0.9254; at character-level, the BLSTM-based deep learning sequential model gave the highest f1-measure of 0.8739. To choose the suitable number of hidden states, we varied it as 4, 16, 32, and 64, and performed training for each. It was observed that the increase in hidden states improved the tagger model. This is an initial work to perform Malayalam Twitter data POS tagging using deep learning sequential models.
引用
收藏
页码:423 / 435
页数:13
相关论文
共 50 条
  • [41] Natural Language Processing and Deep Learning Based Techniques for Evaluation of Companies' Privacy Policies
    John, Saka
    Ajayi, Binyamin Adeniyi
    Marafa, Samaila Musa
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS, ICCSA 2022 WORKSHOPS, PT I, 2022, 13377 : 15 - 32
  • [42] Deep Learning for Natural Language Processing and Language Modelling
    Klosowski, Piotr
    2018 SIGNAL PROCESSING: ALGORITHMS, ARCHITECTURES, ARRANGEMENTS, AND APPLICATIONS (SPA), 2018, : 223 - 228
  • [43] Deep Learning Methods in Natural Language Processing
    Flores, Alexis Stalin Alulema
    APPLIED TECHNOLOGIES (ICAT 2019), PT II, 2020, 1194 : 92 - 107
  • [44] Deep Learning on Graphs for Natural Language Processing
    Wu, Lingfei
    Chen, Yu
    Ji, Heng
    Liu, Bang
    KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 4084 - 4085
  • [45] Deep Structured Learning for Natural Language Processing
    Li, Yong
    Yang, Xiaojun
    Zuo, Min
    Jin, Qingyu
    Li, Haisheng
    Cao, Qian
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (03)
  • [46] Deep Learning on Graphs for Natural Language Processing
    Wu, Lingfei
    Chen, Yu
    Ji, Heng
    Liu, Bang
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 2651 - 2653
  • [47] Deep Learning for Natural Language Processing: A Survey
    Arkhangelskaya E.O.
    Nikolenko S.I.
    Journal of Mathematical Sciences, 2023, 273 (4) : 533 - 582
  • [48] Graph-based Deep Learning in Natural Language Processing
    Vashishth, Shikhar
    Yadati, Naganand
    Talukdar, Partha
    PROCEEDINGS OF THE 7TH ACM IKDD CODS AND 25TH COMAD (CODS-COMAD 2020), 2020, : 371 - 372
  • [49] Recent Trends in Deep Learning Based Natural Language Processing
    Young, Tom
    Hazarika, Devamanyu
    Poria, Soujanya
    Cambria, Erik
    IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE, 2018, 13 (03) : 55 - 75
  • [50] Deep Belief Network Based Part-of-Speech Tagger for Telugu Language
    Jagadeesh, M.
    Kumar, M. Anand
    Soman, K. P.
    PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION TECHNOLOGIES, IC3T 2015, VOL 3, 2016, 381 : 75 - 84