Using hierarchical text classification to investigate the utility of machine learning in automating online analyses of wildlife exploitation

被引:7
|
作者
Hunter, Sara Bronwen [1 ]
Mathews, Fiona [1 ]
Weeds, Julie [2 ]
机构
[1] Univ Sussex, Sch Life Sci, Brighton BN1 9QG, England
[2] Univ Sussex, Sch Engn & Informat, Brighton BN1 9QJ, England
关键词
Machine learning; Natural language processing; iEcology; Wildlife exploitation; Digital conservation; Social media;
D O I
10.1016/j.ecoinf.2023.102076
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
Expanding digital data sources, including social media, online news articles and blogs, provide an opportunity to understand better the context and intensity of human-nature interactions, such as wildlife exploitation. However, online searches encompassing large taxonomic groups can generate vast datasets, which can be overwhelming to filter for relevant content without the use of automated tools. The variety of machine learning models available to researchers, and the need for manually labelled training data with an even balance of labels, can make applying these tools challenging. Here, we implement and evaluate a hierarchical text classification pipeline which brings together three binary classification tasks with increasingly specific relevancy criteria. Crucially, the hierarchical approach facilitates the filtering and structuring of a large dataset, of which relevant sources make up a small proportion. Using this pipeline, we also investigate how the accuracy with which text classifiers identify relevant and irrelevant texts is influenced by the use of different models, training datasets, and the classification task. To evaluate our methods, we collected data from Facebook, Twitter, Google and Bing search engines, with the aim of identifying sources documenting the hunting and persecution of bats (Chiroptera). Overall, the 'state-of-the-art' transformer-based models were able to identify relevant texts with an average accuracy of 90%, with some classifiers achieving accuracy of >95%. Whilst this demonstrates that application of more advanced models can lead to improved accuracy, comparable performance was achieved by simpler models when applied to longer documents and less ambiguous classification tasks. Hence, the benefits from using more computationally expensive models are dependent on the classification context. We also found that stratification of training data, according to the presence of key search terms, improved classification accuracy for less frequent topics within datasets, and therefore improves the applicability of classifiers to future data collection. Overall, whilst our findings reinforce the usefulness of automated tools for facilitating online analyses in conservation and ecology, they also highlight that the effectiveness and appropriateness of such tools is determined by the nature and volume of data collected, the complexity of the classification task, and the computational resources available to researchers.
引用
收藏
页数:11
相关论文
共 44 条
  • [1] Personality Classification from Online Text using Machine Learning Approach
    Khan, Alam Sher
    Ahmad, Hussain
    Asghar, Muhammad Zubair
    Saddozai, Furcian Khan
    Arir, Areeba
    Khalid, Hassan Ali
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (03) : 460 - 476
  • [2] Automating orthogonal defect classification using machine learning algorithms
    Lopes, Fabio
    Agnelo, Joao
    Teixeira, Cesar A.
    Laranjeiro, Nuno
    Bernardino, Jorge
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 102 (102): : 932 - 947
  • [3] Applying Natural Language Processing and Hierarchical Machine Learning Approaches to Text Difficulty Classification
    Renu Balyan
    Kathryn S. McCarthy
    Danielle S. McNamara
    International Journal of Artificial Intelligence in Education, 2020, 30 : 337 - 370
  • [4] Applying Natural Language Processing and Hierarchical Machine Learning Approaches to Text Difficulty Classification
    Balyan, Renu
    McCarthy, Kathryn S.
    McNamara, Danielle S.
    INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE IN EDUCATION, 2020, 30 (03) : 337 - 370
  • [5] HMATC: Hierarchical multi-label Arabic text classification model using machine learning
    Aljedani, Nawal
    Alotaibi, Reem
    Taileb, Mounira
    EGYPTIAN INFORMATICS JOURNAL, 2021, 22 (03) : 225 - 237
  • [6] Automating Ischemic Stroke Subtype Classification Using Machine Learning and Natural Language Processing
    Garg, Ravi
    Oh, Elissa
    Naidech, Andrew
    Kording, Konrad
    Prabhakaran, Shyam
    JOURNAL OF STROKE & CEREBROVASCULAR DISEASES, 2019, 28 (07): : 2045 - 2051
  • [7] Investigate the Impact of Stemming on Mauritanian Dialect Classification using Machine Learning Techniques
    Chrif, Mohamed El Moustapha El Arby
    Seyed, Cheikhane
    Mahmoud, Cheikhne Mohamed
    Mahmoud, E. L. B. E. N. A. N. Y. Mohamed
    Mohamed-Saleck, Fatimetou Mint
    Saleck, Moustapha Mohamed
    EL Beqqali, Omar
    Nanne, Mohamedade Farouk
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (10) : 1013 - 1019
  • [8] Multi-class Text Classification Using Machine Learning Models for Online Drug Reviews
    Joshi, Shreehar
    Abdelfattah, Eman
    2021 IEEE WORLD AI IOT CONGRESS (AIIOT), 2021, : 262 - 267
  • [9] An exploration on text classification using machine learning techniques
    Athanasios, Tzimourtas
    Spyros, Bakalakos
    Panagiota, Tselenti
    Athanasios, Voulodimos
    25TH PAN-HELLENIC CONFERENCE ON INFORMATICS WITH INTERNATIONAL PARTICIPATION (PCI2021), 2021, : 247 - 249
  • [10] Domain Text Classification Using Machine Learning Models
    Rao, Akula V. S. Siva Rama
    Bhavani, D. Ganga
    Krishna, J. Gopi
    Swapna, B.
    Varma, K. Rama Sai
    PROCEEDINGS OF SECOND INTERNATIONAL CONFERENCE ON SUSTAINABLE EXPERT SYSTEMS (ICSES 2021), 2022, 351 : 573 - 582