Predictive keywords: Using machine learning to explain document characteristics

被引:2
作者
Kyroelaeinen, Aki-Juhani [1 ]
Laippala, Veronika [1 ]
机构
[1] Univ Turku, Sch Languages & Translat Studies, Turku, Finland
来源
FRONTIERS IN ARTIFICIAL INTELLIGENCE | 2023年 / 5卷
基金
芬兰科学院;
关键词
keyness; keyword; corpus linguistics; support vector machines; machine learning; CLASSIFICATION; SELECTION; FORESTS;
D O I
10.3389/frai.2022.975729
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When exploring the characteristics of a discourse domain associated with texts, keyword analysis is widely used in corpus linguistics. However, one of the challenges facing this method is the evaluation of the quality of the keywords. Here, we propose casting keyword analysis as a prediction problem with the goal of discriminating the texts associated with the target corpus from the reference corpus. We demonstrate that, when using linear support vector machines, this approach can be used not only to quantify the discrimination between the two corpora, but also extract keywords. To evaluate the keywords, we develop a systematic and rigorous approach anchored to the concepts of usefulness and relevance used in machine learning. The extracted keywords are compared with the recently proposed text dispersion keyness measure. We demonstrate that that our approach extracts keywords that are highly useful and linguistically relevant, capturing the characteristics of their discourse domain.
引用
收藏
页数:23
相关论文
共 50 条
  • [21] Classification Rules Explain Machine Learning
    Cristani, Matteo
    Olvieri, Francesco
    Workneh, Tewabe Chekole
    Pasetto, Luca
    Tomazzoli, Claudio
    ICAART: PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 3, 2022, : 897 - 904
  • [22] Using Machine Learning Approach to Identify Synonyms for Document Mining
    Trappey, Amy J. C.
    Trappey, Charles V.
    Wu, Jheng-Long
    Tsai, Kevin T. -C
    TRANSDISCIPLINARY ENGINEERING FOR COMPLEX SOCIO-TECHNICAL SYSTEMS, 2019, 10 : 509 - 518
  • [23] PREDICTIVE MODELING OF STUDENT SUCCESS USING MACHINE LEARNING
    Hoti, Arber H.
    Zenuni, Xhemal
    Ajdari, Jaumin
    Ismaili, Florije
    INTERNATIONAL JOURNAL ON INFORMATION TECHNOLOGIES AND SECURITY, 2025, 17 (01): : 37 - 46
  • [24] Predictive Analysis of Postpartum Depression Using Machine Learning
    Kim, Hyunkyoung
    HEALTHCARE, 2025, 13 (08)
  • [25] Identifying Predictive Features in Drug Response Using Machine Learning: Opportunities and Challenges
    Vidyasagar, Mathukumalli
    ANNUAL REVIEW OF PHARMACOLOGY AND TOXICOLOGY, VOL 55, 2015, 55 : 15 - 34
  • [26] Predictive Analysis for Personal Loans by Using Machine Learning
    Huang, Hui-I.
    Wang, Chou-Wen
    Wu, Chin-Wen
    HCI IN BUSINESS, GOVERNMENT AND ORGANIZATIONS, PT I, HCIBGO 2024, 2024, 14720 : 187 - 199
  • [27] Predictive Modeling of Software Behavior Using Machine Learning
    Saksupawattanakul, C.
    Vatanawood, W.
    IEEE ACCESS, 2024, 12 : 120584 - 120596
  • [28] A comparative study of machine learning techniques for suicide attempts predictive model
    Nordin, Noratikah
    Zainol, Zurinahni
    Noor, Mohd Halim Mohd
    Fong, Chan Lai
    HEALTH INFORMATICS JOURNAL, 2021, 27 (01)
  • [29] Predictive Modeling of HR Dynamics Using Machine Learning
    Birzniece, Ilze
    Andersone, Ilze
    Nikitenko, Agris
    Zvirbule, Liga
    PROCEEDINGS OF 2022 7TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING TECHNOLOGIES, ICMLT 2022, 2022, : 17 - 23
  • [30] Machine Learning Algorithms for Document Clustering and Fraud Detection
    Yaram, Suresh
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON DATA SCIENCE & ENGINEERING (ICDSE), 2016, : 103 - 108