Using hierarchical text classification to investigate the utility of machine learning in automating online analyses of wildlife exploitation

被引:7
|
作者
Hunter, Sara Bronwen [1 ]
Mathews, Fiona [1 ]
Weeds, Julie [2 ]
机构
[1] Univ Sussex, Sch Life Sci, Brighton BN1 9QG, England
[2] Univ Sussex, Sch Engn & Informat, Brighton BN1 9QJ, England
关键词
Machine learning; Natural language processing; iEcology; Wildlife exploitation; Digital conservation; Social media;
D O I
10.1016/j.ecoinf.2023.102076
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
Expanding digital data sources, including social media, online news articles and blogs, provide an opportunity to understand better the context and intensity of human-nature interactions, such as wildlife exploitation. However, online searches encompassing large taxonomic groups can generate vast datasets, which can be overwhelming to filter for relevant content without the use of automated tools. The variety of machine learning models available to researchers, and the need for manually labelled training data with an even balance of labels, can make applying these tools challenging. Here, we implement and evaluate a hierarchical text classification pipeline which brings together three binary classification tasks with increasingly specific relevancy criteria. Crucially, the hierarchical approach facilitates the filtering and structuring of a large dataset, of which relevant sources make up a small proportion. Using this pipeline, we also investigate how the accuracy with which text classifiers identify relevant and irrelevant texts is influenced by the use of different models, training datasets, and the classification task. To evaluate our methods, we collected data from Facebook, Twitter, Google and Bing search engines, with the aim of identifying sources documenting the hunting and persecution of bats (Chiroptera). Overall, the 'state-of-the-art' transformer-based models were able to identify relevant texts with an average accuracy of 90%, with some classifiers achieving accuracy of >95%. Whilst this demonstrates that application of more advanced models can lead to improved accuracy, comparable performance was achieved by simpler models when applied to longer documents and less ambiguous classification tasks. Hence, the benefits from using more computationally expensive models are dependent on the classification context. We also found that stratification of training data, according to the presence of key search terms, improved classification accuracy for less frequent topics within datasets, and therefore improves the applicability of classifiers to future data collection. Overall, whilst our findings reinforce the usefulness of automated tools for facilitating online analyses in conservation and ecology, they also highlight that the effectiveness and appropriateness of such tools is determined by the nature and volume of data collected, the complexity of the classification task, and the computational resources available to researchers.
引用
收藏
页数:11
相关论文
共 44 条
  • [21] Development and Comparison of Multiple Emotion Classification Models in Indonesia Text Using Machine Learning
    Zamsuri, Ahmad
    Defit, Sarjon
    Nurcahyo, Gunadi Widi
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2024, 15 (04) : 519 - 531
  • [22] A machine learning approach for Arabic text classification using N-gram frequency statistics
    Khreisat, Laila
    JOURNAL OF INFORMETRICS, 2009, 3 (01) : 72 - 77
  • [23] Improving in-text citation reason extraction and classification using supervised machine learning techniques
    Ihsan, Imran
    Rahman, Hameedur
    Shaikh, Asadullah
    Sulaiman, Adel
    Rajab, Khairan
    Rajab, Adel
    COMPUTER SPEECH AND LANGUAGE, 2023, 82
  • [24] Stemming Text-based Web Page Classification using Machine Learning Algorithms: A Comparison
    Razali, Ansari
    Daud, Salwani Mohd
    Zin, Nor Azan Mat
    Shahidi, Faezehsadat
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (01) : 570 - 576
  • [25] WeChat Text and Picture Messages Service Flow Traffic Classification Using Machine Learning Technique
    Shafiq, Muhammad
    Yu, Xiangzhan
    Laghari, Asif Ali
    Yao, Lu
    Karn, Nabin Kumar
    Abdesssamia, Foudil
    Salahuddin
    PROCEEDINGS OF 2016 IEEE 18TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS; IEEE 14TH INTERNATIONAL CONFERENCE ON SMART CITY; IEEE 2ND INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), 2016, : 58 - 62
  • [26] Defending against adversarial machine learning attacks using hierarchical learning: A case study on network traffic attack classification
    McCarthy, Andrew
    Ghadafi, Essam
    Andriotis, Panagiotis
    Legg, Phil
    JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, 2023, 72
  • [27] Novel Machine Learning-Based Approach for Arabic Text Classification Using Stylistic and Semantic Features
    Fkih, Fethi
    Alsuhaibani, Mohammed
    Rhouma, Delel
    Qamar, Ali Mustafa
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (03): : 5871 - 5886
  • [28] Text mining-based construction site accident classification using hybrid supervised machine learning
    Cheng, Min-Yuan
    Kusoemo, Denny
    Gosno, Richard Antoni
    AUTOMATION IN CONSTRUCTION, 2020, 118
  • [29] Text mining and machine learning for crime classification: using unstructured narrative court documents in police academic
    Bifari, Ezdihar
    Basbrain, Arwa
    Mirza, Rsha
    Bafail, Alaa
    Albaeadie, Somayah
    Alhalabi, Wadee
    COGENT ENGINEERING, 2024, 11 (01):
  • [30] Multi-Class Text Classification on Khmer News Using Ensemble Method in Machine Learning Algorithms
    Phann, Raksmey
    Soomlek, Chitsutha
    Seresangtakul, Pusadee
    ACTA INFORMATICA PRAGENSIA, 2023, 12 (02) : 243 - 259