Hidden Markov Model Based Part of Speech Tagging for Nepali Language

被引:0
|
作者
Paul, Abhijit [1 ]
Purkayastha, Bipul Syam [1 ]
Sarkar, Sunita [1 ]
机构
[1] Assam Univ Silchar, Dept Comp Sci, Silchar, India
关键词
NLP; POS; HMM;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Natural Language Processing (NLP) is mainly concerned with the development of computational models and tools of aspects of human (natural) language processing. Part of Speech Tagging (POS) is well studied topic and also one of the most fundamental preprocessing steps for any language in NLP. Natural language processing of Nepali is still lack significant research efforts in the area of NLP in India. POS tagging of Nepali is a necessary component for most NLP applications in Nepali, which analyses the construction of the language, behavior of the language and can be used to develop automated tools for language processing. From the literature survey and related works, it has been found that, not much work has been done previously on POS tagging for Nepali language in India due to lack of comprehensive set of tagged corpus or correct hand written rules. In this paper, Hidden Markov Model (HMM) based Part of Speech (POS) tagging for Nepali language has been discussed. HMM is the most popular used statistical model for POS tagging that uses little amount of knowledge about the language, apart from contextual information of the language. The evaluation of the tagger has been done using the corpora, which are collected from TDIL (Technology Development for Indian Languages) and the BIS tagset of 42 tags. Tagset has been designed to meet the morph-syntactic requirements of the Nepali language. Apart from corpora and the tagset, python programming language and the NLTK's (Natural Language Toolkit) library has been used for implementation. The tagger achieves accuracy over 96% for known words but for unknown words, the research is still continuing.
引用
收藏
页码:149 / 156
页数:8
相关论文
共 50 条
  • [1] Part of Speech Tagging for Kayah Language Using Hidden Markov Model
    Linn, Zar Zar
    Patil, Pushpa B.
    2019 4TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, COMMUNICATION, COMPUTER TECHNOLOGIES AND OPTIMIZATION TECHNIQUES (ICEECCOT), 2019, : 228 - 233
  • [2] Hidden Markov Model with Rule Based Approach for Part of Speech Tagging of Myanmar Language
    Zin, Khine Khine
    PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND INFORMATION TECHNOLOGY, 2009, : 123 - +
  • [3] A Hidden Markov Model for Persian Part-of-Speech Tagging
    Okhovvat, Morteza
    Bidgoli, Behrouz Minaei
    WORLD CONFERENCE ON INFORMATION TECHNOLOGY (WCIT-2010), 2011, 3
  • [4] A part-of-speech tagging method based on improved hidden Markov model
    Yuan, L.-C. (yuanlichi@sohu.com), 1600, Central South University of Technology (43):
  • [5] Part-of-speech tagging based on hidden Markov model assuming joint independence
    Lee, SZ
    Tsujii, J
    Rim, HC
    38TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2000, : 263 - 269
  • [6] Named Entity Recognition Based On A Hidden Markov Model in Part-Of-Speech Tagging
    Ageishi, Ryohei
    Miura, Takao
    2008 FIRST INTERNATIONAL CONFERENCE ON THE APPLICATIONS OF DIGITAL INFORMATION AND WEB TECHNOLOGIES, VOLS 1 AND 2, 2008, : 404 - 409
  • [8] A Deep Learning Approach for Part-of-Speech Tagging in Nepali Language
    Prabha, Greeshma
    Jyothsna, P., V
    Shahina, K. K.
    Premjith, B.
    Soman, K. P.
    2018 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2018, : 1132 - 1136
  • [9] Automatic Part of Speech Tagging for Arabic: An Experiment Using Bigram Hidden Markov Model
    Albared, Mohammed
    Omar, Nazlia
    Ab Aziz, Mohd Juzaiddin
    Nazri, Mohd Zakree Ahmad
    ROUGH SET AND KNOWLEDGE TECHNOLOGY (RSKT), 2010, 6401 : 361 - 370
  • [10] Research on Modern Chinese Multi-category Words Part of Speech Tagging Based on Hidden Markov Model
    Song, Zhendong
    PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON MECHATRONICS, ELECTRONIC, INDUSTRIAL AND CONTROL ENGINEERING, 2014, 5 : 393 - 397