A machine learning approach to POS tagging

被引:21
|
作者
Màrquez, L [1 ]
Padró, L [1 ]
Rodríguez, H [1 ]
机构
[1] Univ Politecn Cataluna, Dept Llenguatges & Sist Informat, Barcelona 08034, Spain
关键词
part of speech tagging; corpus-based and statistical language modeling; decision trees induction; constraint satisfaction; relaxation labeling;
D O I
10.1023/A:1007673816718
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We have applied the inductive learning of statistical decision trees and relaxation labeling to the Natural Language Processing (NLP) task of morphosyntactic disambiguation (Part Of Speech Tagging). The learning process is supervised and obtains a language model oriented to resolve POS ambiguities, consisting of a set of statistical decision trees expressing distribution of tags and words in some relevant contexts. The acquired decision trees have been directly used in a tagger that is both relatively simple and fast, and which has been tested and evaluated on the Wall Street Journal (WSJ) corpus with competitive accuracy. However, better results can be obtained by translating the trees into rules to feed a flexible relaxation labeling based tagger. In this direction we describe a tagger which is able to use information of any kind (n-grams, automatically acquired constraints, linguistically motivated manually written constraints, etc.), and in particular to incorporate the machine-learned decision trees. Simultaneously, we address the problem of tagging when only limited training material is available, which is crucial in any process of constructing, from scratch, an annotated corpus. We show that high levels of accuracy can be achieved with our system in this situation, and report some results obtained when using it to develop a 5.5 million words Spanish corpus from scratch.
引用
收藏
页码:59 / 91
页数:33
相关论文
共 50 条
  • [1] A Machine Learning Approach to POS Tagging
    Lluís Màrquez
    Lluís Padró
    Horacio Rodríguez
    Machine Learning, 2000, 39 : 59 - 91
  • [2] A Machine Learning Approach to POS Tagging Case study: Amazighe language
    Samir, Amri
    Rkia, Bani
    Lahbib, Zenkouar
    Zouhair, Guennoun
    2022 2ND INTERNATIONAL CONFERENCE ON INNOVATIVE RESEARCH IN APPLIED SCIENCE, ENGINEERING AND TECHNOLOGY (IRASET'2022), 2022, : 410 - 413
  • [3] Fake News Detection Using Pos Tagging and Machine Learning
    Kansal, Afreen
    JOURNAL OF APPLIED SECURITY RESEARCH, 2023, 18 (02) : 164 - 179
  • [4] Application of POS Tagging in Machine Translation Evaluation
    Benko, L'ubomir
    Munkova, Dasa
    DIVAI 2016: 11TH INTERNATIONAL SCIENTIFIC CONFERENCE ON DISTANCE LEARNING IN APPLIED INFORMATICS, 2016, : 471 - 479
  • [5] Improved POS Tagging Model for Malay Twitter Data based on Machine Learning Algorithm
    Ariffin, Siti Noor Allia Noor
    Tiun, Sabrina
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (07) : 229 - 234
  • [6] Modeling of learning curves with applications to POS tagging
    Vilares Ferro, Manuel
    Darriba Bilbao, Victor Manuel
    Ribadas Pena, Francisco Jose
    COMPUTER SPEECH AND LANGUAGE, 2017, 41 : 1 - 28
  • [7] A hybrid approach to word segmentation and POS tagging
    Oki Electric Industry Co., Ltd., 2−5−7 Honmachi, Chuo-ku, Osaka
    541−0053, Japan
    不详
    619−0289, Japan
    Proc. Annu. Meet. Assoc. Comput Linguist., 1600, (217-220):
  • [8] A BERT Based Approach for Arabic POS Tagging
    Saidi, Rakia
    Jarray, Fethi
    Mansour, Mahmud
    ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2021, PT I, 2021, 12861 : 311 - 321
  • [9] A Deep Learning-Based Approach for Part of Speech (PoS) Tagging in the Pashto Language
    Ullah, Shaheen
    Ahmad, Riaz
    Namoun, Abdallah
    Muhammad, Siraj
    Ullah, Khalil
    Hussain, Ibrar
    Ibrahim, Isa Ali
    IEEE ACCESS, 2024, 12 : 86355 - 86364
  • [10] Deep Learning Based Unsupervised POS Tagging for Sanskrit
    Srivastava, Prakhar
    Chauhan, Kushal
    Aggarwal, Deepanshu
    Shukla, Anupam
    Dhar, Joydip
    Jain, Vrashabh Prasad
    2018 INTERNATIONAL CONFERENCE ON ALGORITHMS, COMPUTING AND ARTIFICIAL INTELLIGENCE (ACAI 2018), 2018,