Predictive keywords: Using machine learning to explain document characteristics

被引:2
作者
Kyroelaeinen, Aki-Juhani [1 ]
Laippala, Veronika [1 ]
机构
[1] Univ Turku, Sch Languages & Translat Studies, Turku, Finland
来源
FRONTIERS IN ARTIFICIAL INTELLIGENCE | 2023年 / 5卷
基金
芬兰科学院;
关键词
keyness; keyword; corpus linguistics; support vector machines; machine learning; CLASSIFICATION; SELECTION; FORESTS;
D O I
10.3389/frai.2022.975729
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When exploring the characteristics of a discourse domain associated with texts, keyword analysis is widely used in corpus linguistics. However, one of the challenges facing this method is the evaluation of the quality of the keywords. Here, we propose casting keyword analysis as a prediction problem with the goal of discriminating the texts associated with the target corpus from the reference corpus. We demonstrate that, when using linear support vector machines, this approach can be used not only to quantify the discrimination between the two corpora, but also extract keywords. To evaluate the keywords, we develop a systematic and rigorous approach anchored to the concepts of usefulness and relevance used in machine learning. The extracted keywords are compared with the recently proposed text dispersion keyness measure. We demonstrate that that our approach extracts keywords that are highly useful and linguistically relevant, capturing the characteristics of their discourse domain.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Predictive modeling for wine authenticity using a machine learning approach
    Costa, Nattane Luiza da
    Valentin, Leonardo A.
    Castro, Inar Alves
    Barbosa, Rommel Melgaco
    ARTIFICIAL INTELLIGENCE IN AGRICULTURE, 2021, 5 : 157 - 162
  • [2] Predictive modeling of flow characteristics in supersonic separators using machine learning
    Bahadornia, Atabak
    Mojaddam, Mohammad
    FUEL, 2024, 374
  • [3] Developing predictive models of construction fatality characteristics using machine learning
    Zhu, Jianbo
    Shi, Qianqian
    Li, Qiming
    Shou, Wenchi
    Li, Haijiang
    Wu, Peng
    SAFETY SCIENCE, 2023, 164
  • [4] Developing predictive models of construction fatality characteristics using machine learning
    Zhu J.
    Shi Q.
    Li Q.
    Shou W.
    Li H.
    Wu P.
    Safety Science, 2023, 164
  • [5] Drinking Addiction Predictive Model Using Body Characteristics Machine Learning Approach
    Karmakar, Mousumi
    Al Kafi, Md Abdullah
    Sabbir, Wahid
    Afridi, Arafat Sahin
    Raza, Dewan Mamun
    ADVANCED NETWORK TECHNOLOGIES AND INTELLIGENT COMPUTING, ANTIC 2023, PT III, 2024, 2092 : 364 - 383
  • [6] Relation between Titles and Keywords in Japanese Academic Papers using Quantitative Analysis and Machine Learning
    Murata, Masaki
    Morimoto, Natsumi
    COMPUTACION Y SISTEMAS, 2019, 23 (03): : 959 - 968
  • [7] Optimizing the predictive power of depression screenings using machine learning
    Terhorst, Yannik
    Sander, Lasse B.
    Ebert, David D.
    Baumeister, Harald
    DIGITAL HEALTH, 2023, 9
  • [8] PREDICTIVE MAINTENANCE AND MONITORING OF INDUSTRIAL MACHINE USING MACHINE LEARNING
    Masani, Kausha I.
    Oza, Parita
    Agrawal, Smita
    SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2019, 20 (04): : 663 - 668
  • [9] Combinations in predictive analytics by using machine learning
    Gulay, Emrah
    Duru, Okan
    2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2018, : 2097 - 2103
  • [10] Web Document Classification by Keywords Using Random Forests
    Klassen, Myungsook
    Paturi, Nikhila
    NETWORKED DIGITAL TECHNOLOGIES, PT 2, 2010, 88 : 256 - 261