Active Learning Strategies for Textual Dataset-Automatic Labelling

被引：0

作者：

Daudpota, Sher Muhammad ^{[1
]}

Hassan, Saif ^{[1
]}

Alkhurayyif, Yazeed ^{[2
]}

Alqahtani, Abdullah Saleh ^{[3
,4
]}

Aziz, Muhammad Haris ^{[5
]}

机构：

[1] Sukkur IBA Univ, Dept Comp Sci, Sukkur 65200, Pakistan

[2] Shaqra Univ, Al Quwayiyah Coll Sci & Humanities, Shaqra 15526, Saudi Arabia

[3] King Saud Univ, Self Dev Skills Dept, Common First Year Deanship, Riyadh 12373, Saudi Arabia

[4] King Saud Univ, Dept Informat Syst, Coll Comp & Informat Sci, STCs Artificial Intelligence Chair, Riyadh 11451, Saudi Arabia

[5] Univ Sargodha, Coll Engn & Technol, Sargodha 40100, Pakistan

来源：

CMC-COMPUTERS MATERIALS & CONTINUA | 2023年 / 76卷 / 02期

关键词：

Active learning; automatic labelling; textual datasets; CLASSIFICATION;

D O I：

10.32604/cmc.2023.034157

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The Internet revolution has resulted in abundant data from various sources, including social media, traditional media, etcetera. Although the availability of data is no longer an issue, data labelling for exploiting it in supervised machine learning is still an expensive process and involves tedious human efforts. The overall purpose of this study is to propose a strategy to automatically label the unlabeled textual data with the support of active learning in combination with deep learning. More specifically, this study assesses the performance of different active learning strategies in automatic labelling of the textual dataset at sentence and document levels. To achieve this objective, different experiments have been performed on the publicly available dataset. In first set of experiments, we randomly choose a subset of instances from training dataset and train a deep neural network to assess performance on test set. In the second set of experiments, we replace the random selection with different active learning strategies to choose a subset of the training dataset to train the same model and reassess its performance on test set. The experimental results suggest that different active learning strategies yield performance improvement of 7% on document level datasets and 3% on sentence level datasets for auto labelling.

引用

页码：1409 / 1422

页数：14

共 50 条

[1] Empirical Study of Automatic Dataset Labelling
Aparicio-Navarro, Francisco J.
Kyriakopoulos, Konstantinos G.
Parish, David J.
2014 9TH INTERNATIONAL CONFERENCE FOR INTERNET TECHNOLOGY AND SECURED TRANSACTIONS (ICITST), 2014, : 372 - 378
[2] Interactive labelling of a multivariate dataset for supervised machine learning using linked visualisations, clustering, and active learning
Chegini, Mohammad
Bernard, Juergen
Berger, Philip
Sourin, Alexei
Andrews, Keith
Schreck, Tobias
VISUAL INFORMATICS, 2019, 3 (01) : 9 - 17
[3] Automatic Dataset Labelling and Feature Selection for Intrusion Detection Systems
Aparicio-Navarro, Francisco J.
Kyriakopoulos, Konstantinos G.
Parish, David J.
2014 IEEE MILITARY COMMUNICATIONS CONFERENCE: AFFORDABLE MISSION SUCCESS: MEETING THE CHALLENGE (MILCOM 2014), 2014, : 46 - 51
[4] Active Learning Strategies on a Real-World Thyroid Ultrasound Dataset
Sreedhar, Hari
Lajoinie, Guillaume P. R.
Raffaelli, Charles
Delingette, Herve
DATA AUGMENTATION, LABELLING, AND IMPERFECTIONS, DALI 2023, 2024, 14379 : 127 - 136
[5] Fine-Tuning Transformer-Based Representations in Active Learning for Labelling Crisis Dataset of Tweets
Paul N.R.
Balabantaray R.C.
Sahoo D.
SN Computer Science, 4 (5)
[6] Empirical investigation of active learning strategies
Pereira-Santos, Davi
Cavalcante Prudencio, Ricardo Bastos
de Carvalho, Andre C. P. L. F.
NEUROCOMPUTING, 2019, 326 : 15 - 27
[7] Knowledge Transfer for Active Learning in Textual Anonymisation
Garcia-Sardina, Laura
Serras, Manex
del Pozo, Arantza
STATISTICAL LANGUAGE AND SPEECH PROCESSING, SLSP 2018, 2018, 11171 : 155 - 166
[8] Generative Adversarial Model-Guided Deep Active Learning for Voltage Dip Labelling
Bagheri, Azam
Gu, Irene Y. H.
Bollen, Math H. J.
2019 IEEE MILAN POWERTECH, 2019,
[9] Interactive visual labelling versus active learning: an experimental comparison
Mohammad Chegini
Jürgen Bernard
Jian Cui
Fatemeh Chegini
Alexei Sourin
Keith Andrews
Tobias Schreck
Frontiers of Information Technology & Electronic Engineering, 2020, 21 : 524 - 535
[10] Interactive visual labelling versus active learning: an experimental comparison
Chegini, Mohammad
Bernard, Jurgen
Cui, Jian
Chegini, Fatemeh
Sourin, Alexei
Andrews, Keith
Schreck, Tobias
FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2020, 21 (04) : 524 - 535

← 1 2 3 4 5 →