Predictive keywords: Using machine learning to explain document characteristics

被引:2
作者
Kyroelaeinen, Aki-Juhani [1 ]
Laippala, Veronika [1 ]
机构
[1] Univ Turku, Sch Languages & Translat Studies, Turku, Finland
来源
FRONTIERS IN ARTIFICIAL INTELLIGENCE | 2023年 / 5卷
基金
芬兰科学院;
关键词
keyness; keyword; corpus linguistics; support vector machines; machine learning; CLASSIFICATION; SELECTION; FORESTS;
D O I
10.3389/frai.2022.975729
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When exploring the characteristics of a discourse domain associated with texts, keyword analysis is widely used in corpus linguistics. However, one of the challenges facing this method is the evaluation of the quality of the keywords. Here, we propose casting keyword analysis as a prediction problem with the goal of discriminating the texts associated with the target corpus from the reference corpus. We demonstrate that, when using linear support vector machines, this approach can be used not only to quantify the discrimination between the two corpora, but also extract keywords. To evaluate the keywords, we develop a systematic and rigorous approach anchored to the concepts of usefulness and relevance used in machine learning. The extracted keywords are compared with the recently proposed text dispersion keyness measure. We demonstrate that that our approach extracts keywords that are highly useful and linguistically relevant, capturing the characteristics of their discourse domain.
引用
收藏
页数:23
相关论文
共 50 条
  • [41] A Sensor Predictive Model for Power Consumption using Machine Learning
    Moocheet, Nalveer
    Jaumard, Brigitte
    Thibault, Pierre
    Eleftheriadis, Lackis
    2023 IEEE 12TH INTERNATIONAL CONFERENCE ON CLOUD NETWORKING, CLOUDNET, 2023, : 238 - 246
  • [42] Predictive modelling and analytics for diabetes using a machine learning approach
    Kaur, Harleen
    Kumari, Vinita
    APPLIED COMPUTING AND INFORMATICS, 2022, 18 (1/2) : 90 - 100
  • [43] Predictive models for diabetes mellitus using machine learning techniques
    Lai, Hang
    Huang, Huaxiong
    Keshavjee, Karim
    Guergachi, Aziz
    Gao, Xin
    BMC ENDOCRINE DISORDERS, 2019, 19 (01)
  • [44] A Predictive Model for Turbulence Evolution and Mixing Using Machine Learning
    Wang, Yuhang
    Shelyag, Sergiy
    Schluter, Jorg
    IEEE ACCESS, 2024, 12 : 115182 - 115196
  • [45] Predictive Analysis of Cervical Cancer Using Machine Learning Techniques
    Kumawat, Gaurav
    Vishwakarma, Santosh Kumar
    Chakrabarti, Prasun
    SMART TRENDS IN COMPUTING AND COMMUNICATIONS, VOL 1, SMARTCOM 2024, 2024, 945 : 501 - 516
  • [46] Predictive Diagnosis of Alzheimer's Disease using Machine Learning
    Vuddanti, Sowjanya
    Yasmin, Neeha
    Dishasri, L.
    Somanath, Neela
    Prasanth, Y.
    2ND INTERNATIONAL CONFERENCE ON SUSTAINABLE COMPUTING AND SMART SYSTEMS, ICSCSS 2024, 2024, : 928 - 934
  • [47] Predictive analytics of disc brake deformation using machine learning
    Hujare, Pravin
    Rathod, Praveen
    Kamble, Dinesh
    Jomde, Amit
    Wankhede, Shalini
    JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2024, 45 (04) : 1153 - 1163
  • [48] T-PdM: a tripartite predictive maintenance framework using machine learning algorithms
    Yurek, Ozlem Ece
    Birant, Derya
    Kut, Alp
    INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2022, 25 (03) : 325 - 338
  • [49] Predictive geochemical mapping using machine learning in western Kenya
    Humphrey, Olivier S.
    Cave, Mark
    Hamilton, Elliott M.
    Osano, Odipo
    Menya, Diana
    Watts, Michael J.
    GEODERMA REGIONAL, 2023, 35
  • [50] Predictive Modeling for Identifying Undervalued Stocks Using Machine Learning
    Sukma, Narongsak
    Namahoot, Chakkrit Snae
    INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING, 2025,