The Multiclass Classification of Newspaper Articles with Machine Learning: The Hybrid Binary Snowball Approach

被引:16
|
作者
Sebok, Miklos [1 ]
Kacsuk, Zoltan [1 ,2 ]
机构
[1] Hungarian Acad Sci, Ctr Social Sci, Budapest, Hungary
[2] Hsch Medien, Stuttgart, Germany
关键词
machine learning; statistical analysis of texts; Comparative Agendas Project; multiclass classification; automated content analysis;
D O I
10.1017/pan.2020.27
中图分类号
D0 [政治学、政治理论];
学科分类号
0302 ; 030201 ;
摘要
In this article, we present a machine learning-based solution for matching the performance of the gold standard of double-blind human coding when it comes to content analysis in comparative politics. We combine a quantitative text analysis approach with supervised learning and limited human resources in order to classify the front-page articles of a leading Hungarian daily newspaper based on their full text. Our goal was to assign items in our dataset to one of 21 policy topics based on the codebook of the Comparative Agendas Project. The classification of the imbalanced classes of topics was handled by a hybrid binary snowball workflow. This relies on limited human resources as well as supervised learning; it simplifies the multiclass problem to one of binary choice; and it is based on a snowball approach as we augment the training set with machine-classified observations after each successful round and also between corpora. Our results show that our approach provided better precision results (of over 80% for most topic codes) than what is customary for human coders and most computer-assisted coding projects. Nevertheless, this high precision came at the expense of a relatively low, below 60%, share of labeled articles.
引用
收藏
页码:236 / 249
页数:14
相关论文
共 50 条
  • [31] A Machine Learning Approach to Classification of Okra
    Diop, Papa Moussa
    Takamoto, Jin
    Nakamura, Yuji
    Nakamura, Morikazu
    35TH INTERNATIONAL TECHNICAL CONFERENCE ON CIRCUITS/SYSTEMS, COMPUTERS AND COMMUNICATIONS (ITC-CSCC 2020), 2020, : 254 - 257
  • [32] A New Method for Binary Classification of Proteins with Machine Learning
    Perri, Damiano
    Simonetti, Marco
    Lombardi, Andrea
    Faginas-Lago, Noelia
    Gervasi, Osvaldo
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS, ICCSA 2021, PT X, 2021, 12958 : 388 - 397
  • [33] Machine Learning With the Sugeno Integral: The Case of Binary Classification
    Abbaszadeh, Sadegh
    Huellermeier, Eyke
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2021, 29 (12) : 3723 - 3733
  • [34] Binary Classification of Heart Disease Based on Differential Evolution-Optimised Machine Learning Approach
    Egling, Theodore Nicholas Richard
    Mbuyu, Sumbwanyambe
    Wang, Zenghui
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2024, 15 (04) : 467 - 479
  • [35] Classification of legal articles based on bio ethics related to machine learning
    Zhou L.
    Journal of Commercial Biotechnology, 2021, 26 (04) : 171 - 178
  • [36] Multi-layer hybrid machine learning techniques for anomalies detection and classification approach
    Aziz, Amira Sayed A.
    Hassanien, Aboul Ella
    Hanafy, Sanaa El-Ola
    Tolba, M. F.
    2013 13TH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS (HIS), 2013, : 215 - 220
  • [37] Automated classification of tropical shrub species: a hybrid of leaf shape and machine learning approach
    Murat, Miraemiliana
    Chang, Siow-Wee
    Abu, Arpah
    Yap, Hwa Jen
    Yong, Kien-Thai
    PEERJ, 2017, 5
  • [38] Binary Malware image Classification using Machine Learning with Local Binary Pattern
    Luo, Jhu-Sin
    Lo, Dan Chia-Tien
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 4664 - 4667
  • [39] AN EMPIRICAL STUDY ON THE CLASSIFICATION OF CHINESE NEWS ARTICLES BY MACHINE LEARNING AND DEEP LEARNING TECHNIQUES
    Huang, Chuen-Min
    Jiang, Yi-Jun
    PROCEEDINGS OF 2019 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), 2019, : 462 - 467
  • [40] A Machine Learning Approach to classify News Articles based on Location
    Rao, Vignesh
    Sachdev, Jayant
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INTELLIGENT SUSTAINABLE SYSTEMS (ICISS 2017), 2017, : 863 - 867