Active Learning Strategies for Textual Dataset-Automatic Labelling

被引:0
|
作者
Daudpota, Sher Muhammad [1 ]
Hassan, Saif [1 ]
Alkhurayyif, Yazeed [2 ]
Alqahtani, Abdullah Saleh [3 ,4 ]
Aziz, Muhammad Haris [5 ]
机构
[1] Sukkur IBA Univ, Dept Comp Sci, Sukkur 65200, Pakistan
[2] Shaqra Univ, Al Quwayiyah Coll Sci & Humanities, Shaqra 15526, Saudi Arabia
[3] King Saud Univ, Self Dev Skills Dept, Common First Year Deanship, Riyadh 12373, Saudi Arabia
[4] King Saud Univ, Dept Informat Syst, Coll Comp & Informat Sci, STCs Artificial Intelligence Chair, Riyadh 11451, Saudi Arabia
[5] Univ Sargodha, Coll Engn & Technol, Sargodha 40100, Pakistan
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2023年 / 76卷 / 02期
关键词
Active learning; automatic labelling; textual datasets; CLASSIFICATION;
D O I
10.32604/cmc.2023.034157
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The Internet revolution has resulted in abundant data from various sources, including social media, traditional media, etcetera. Although the availability of data is no longer an issue, data labelling for exploiting it in supervised machine learning is still an expensive process and involves tedious human efforts. The overall purpose of this study is to propose a strategy to automatically label the unlabeled textual data with the support of active learning in combination with deep learning. More specifically, this study assesses the performance of different active learning strategies in automatic labelling of the textual dataset at sentence and document levels. To achieve this objective, different experiments have been performed on the publicly available dataset. In first set of experiments, we randomly choose a subset of instances from training dataset and train a deep neural network to assess performance on test set. In the second set of experiments, we replace the random selection with different active learning strategies to choose a subset of the training dataset to train the same model and reassess its performance on test set. The experimental results suggest that different active learning strategies yield performance improvement of 7% on document level datasets and 3% on sentence level datasets for auto labelling.
引用
收藏
页码:1409 / 1422
页数:14
相关论文
共 50 条
  • [1] Empirical Study of Automatic Dataset Labelling
    Aparicio-Navarro, Francisco J.
    Kyriakopoulos, Konstantinos G.
    Parish, David J.
    2014 9TH INTERNATIONAL CONFERENCE FOR INTERNET TECHNOLOGY AND SECURED TRANSACTIONS (ICITST), 2014, : 372 - 378
  • [2] Interactive labelling of a multivariate dataset for supervised machine learning using linked visualisations, clustering, and active learning
    Chegini, Mohammad
    Bernard, Juergen
    Berger, Philip
    Sourin, Alexei
    Andrews, Keith
    Schreck, Tobias
    VISUAL INFORMATICS, 2019, 3 (01) : 9 - 17
  • [3] Automatic Dataset Labelling and Feature Selection for Intrusion Detection Systems
    Aparicio-Navarro, Francisco J.
    Kyriakopoulos, Konstantinos G.
    Parish, David J.
    2014 IEEE MILITARY COMMUNICATIONS CONFERENCE: AFFORDABLE MISSION SUCCESS: MEETING THE CHALLENGE (MILCOM 2014), 2014, : 46 - 51
  • [4] Active Learning Strategies on a Real-World Thyroid Ultrasound Dataset
    Sreedhar, Hari
    Lajoinie, Guillaume P. R.
    Raffaelli, Charles
    Delingette, Herve
    DATA AUGMENTATION, LABELLING, AND IMPERFECTIONS, DALI 2023, 2024, 14379 : 127 - 136
  • [5] Fine-Tuning Transformer-Based Representations in Active Learning for Labelling Crisis Dataset of Tweets
    Paul N.R.
    Balabantaray R.C.
    Sahoo D.
    SN Computer Science, 4 (5)
  • [6] Empirical investigation of active learning strategies
    Pereira-Santos, Davi
    Cavalcante Prudencio, Ricardo Bastos
    de Carvalho, Andre C. P. L. F.
    NEUROCOMPUTING, 2019, 326 : 15 - 27
  • [7] Knowledge Transfer for Active Learning in Textual Anonymisation
    Garcia-Sardina, Laura
    Serras, Manex
    del Pozo, Arantza
    STATISTICAL LANGUAGE AND SPEECH PROCESSING, SLSP 2018, 2018, 11171 : 155 - 166
  • [8] Generative Adversarial Model-Guided Deep Active Learning for Voltage Dip Labelling
    Bagheri, Azam
    Gu, Irene Y. H.
    Bollen, Math H. J.
    2019 IEEE MILAN POWERTECH, 2019,
  • [9] Interactive visual labelling versus active learning: an experimental comparison
    Mohammad Chegini
    Jürgen Bernard
    Jian Cui
    Fatemeh Chegini
    Alexei Sourin
    Keith Andrews
    Tobias Schreck
    Frontiers of Information Technology & Electronic Engineering, 2020, 21 : 524 - 535
  • [10] Interactive visual labelling versus active learning: an experimental comparison
    Chegini, Mohammad
    Bernard, Jurgen
    Cui, Jian
    Chegini, Fatemeh
    Sourin, Alexei
    Andrews, Keith
    Schreck, Tobias
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2020, 21 (04) : 524 - 535