Predictive keywords: Using machine learning to explain document characteristics

被引:2
作者
Kyroelaeinen, Aki-Juhani [1 ]
Laippala, Veronika [1 ]
机构
[1] Univ Turku, Sch Languages & Translat Studies, Turku, Finland
来源
FRONTIERS IN ARTIFICIAL INTELLIGENCE | 2023年 / 5卷
基金
芬兰科学院;
关键词
keyness; keyword; corpus linguistics; support vector machines; machine learning; CLASSIFICATION; SELECTION; FORESTS;
D O I
10.3389/frai.2022.975729
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When exploring the characteristics of a discourse domain associated with texts, keyword analysis is widely used in corpus linguistics. However, one of the challenges facing this method is the evaluation of the quality of the keywords. Here, we propose casting keyword analysis as a prediction problem with the goal of discriminating the texts associated with the target corpus from the reference corpus. We demonstrate that, when using linear support vector machines, this approach can be used not only to quantify the discrimination between the two corpora, but also extract keywords. To evaluate the keywords, we develop a systematic and rigorous approach anchored to the concepts of usefulness and relevance used in machine learning. The extracted keywords are compared with the recently proposed text dispersion keyness measure. We demonstrate that that our approach extracts keywords that are highly useful and linguistically relevant, capturing the characteristics of their discourse domain.
引用
收藏
页数:23
相关论文
共 50 条
  • [31] Predictive Churn Analysis with Machine Learning Methods
    Gunay, Melike
    Ensari, Tolga
    2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [32] A machine learning approach for predictive warehouse design
    Tufano, Alessandro
    Accorsi, Riccardo
    Manzini, Riccardo
    INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2022, 119 (3-4) : 2369 - 2392
  • [33] Machine learning-based keywords extraction for scientific literature
    Wu, Chunguo
    Marchese, Maurizio
    Jiang, Jingqing
    Ivanyukovich, Alexander
    Liang, Yanchun
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2007, 13 (10) : 1471 - 1483
  • [34] Item Difficulty Prediction Using Item Text Features: Comparison of Predictive Performance across Machine-Learning Algorithms
    Stepanek, Lubomir
    Dlouha, Jana
    Martinkova, Patricia
    MATHEMATICS, 2023, 11 (19)
  • [35] Predictive analysis for pathogenicity classification of H5Nx avian influenza strains using machine learning techniques
    Chadha, Akshay
    Dara, Rozita
    Pearl, David L.
    Sharif, Shayan
    Poljak, Zvonimir
    PREVENTIVE VETERINARY MEDICINE, 2023, 216
  • [36] A predictive model for die roll height in fine blanking using machine learning methods
    Stanke, Joachim
    Feuerhack, Andreas
    Trauth, Daniel
    Mattfeld, Patrick
    Klocke, Fritz
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON METAL FORMING METAL FORMING 2018, 2018, 15 : 570 - 577
  • [37] Predictive modelling of mineral prospectivity using satellite remote sensing and machine learning algorithms
    Mahboob, Muhammad Ahsan
    Celik, Turgay
    Genc, Bekir
    REMOTE SENSING APPLICATIONS-SOCIETY AND ENVIRONMENT, 2024, 36
  • [38] Predictive Recommining: Learning Relations Between Event Log Characteristics and Machine Learning Approaches for Supporting Predictive Process Monitoring
    Drodt, Christoph
    Weinzierl, Sven
    Matzner, Martin
    Delfmann, Patrick
    INTELLIGENT INFORMATION SYSTEMS, CAISE FORUM 2023, 2023, 477 : 69 - 76
  • [39] Enhancing electrical panel anomaly detection for predictive maintenance with machine learning and IoT
    Peksen, Muhammed Fatih
    Yurtsever, Ulas
    Uyaroglu, Yilmaz
    ALEXANDRIA ENGINEERING JOURNAL, 2024, 96 : 112 - 123
  • [40] Predictive modeling of engine emissions using machine learning: A review
    Khurana, Shivansh
    Saxena, Shubham
    Jain, Sanyam
    Dixit, Ankur
    MATERIALS TODAY-PROCEEDINGS, 2021, 38 : 280 - 284