Active Learning Strategies for Textual Dataset-Automatic Labelling

被引：0

作者：

Daudpota, Sher Muhammad ^{[1
]}

Hassan, Saif ^{[1
]}

Alkhurayyif, Yazeed ^{[2
]}

Alqahtani, Abdullah Saleh ^{[3
,4
]}

Aziz, Muhammad Haris ^{[5
]}

机构：

[1] Sukkur IBA Univ, Dept Comp Sci, Sukkur 65200, Pakistan

[2] Shaqra Univ, Al Quwayiyah Coll Sci & Humanities, Shaqra 15526, Saudi Arabia

[3] King Saud Univ, Self Dev Skills Dept, Common First Year Deanship, Riyadh 12373, Saudi Arabia

[4] King Saud Univ, Dept Informat Syst, Coll Comp & Informat Sci, STCs Artificial Intelligence Chair, Riyadh 11451, Saudi Arabia

[5] Univ Sargodha, Coll Engn & Technol, Sargodha 40100, Pakistan

来源：

CMC-COMPUTERS MATERIALS & CONTINUA | 2023年 / 76卷 / 02期

关键词：

Active learning; automatic labelling; textual datasets; CLASSIFICATION;

D O I：

10.32604/cmc.2023.034157

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The Internet revolution has resulted in abundant data from various sources, including social media, traditional media, etcetera. Although the availability of data is no longer an issue, data labelling for exploiting it in supervised machine learning is still an expensive process and involves tedious human efforts. The overall purpose of this study is to propose a strategy to automatically label the unlabeled textual data with the support of active learning in combination with deep learning. More specifically, this study assesses the performance of different active learning strategies in automatic labelling of the textual dataset at sentence and document levels. To achieve this objective, different experiments have been performed on the publicly available dataset. In first set of experiments, we randomly choose a subset of instances from training dataset and train a deep neural network to assess performance on test set. In the second set of experiments, we replace the random selection with different active learning strategies to choose a subset of the training dataset to train the same model and reassess its performance on test set. The experimental results suggest that different active learning strategies yield performance improvement of 7% on document level datasets and 3% on sentence level datasets for auto labelling.

引用

页码：1409 / 1422

页数：14

共 50 条

[21] An automatic essay correction for an active learning environment
Frinhani, Cristovao Lima
Andrade de Freitas, Sergio Antonio
Fernandes, Mauricio Vidotti
Canedo, Edna Dias
2016 IEEE/ACS 13TH INTERNATIONAL CONFERENCE OF COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2016,
[22] A Unified Framework for Automatic Distributed Active Learning
Chen, Xu
Wujek, Brett
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 9774 - 9786
[23] Active Learning Based Weak Supervision for Textual Survey Response Classification
Patil, Sangameshwar
Ravindran, B.
COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT II, 2015, 9042 : 309 - 320
[24] The Impact Of Active Learning Strategies On Education And Learning Achievements
Alkasem, Ahmad Abdulrahman
Mohamed, Yuslina
IJAZ ARABI JOURNAL OF ARABIC LEARNING, 2025, 8 (01): : 31 - 40
[25] Automatic selection of learning bias for active sampling
dos Santos, Davi P.
de Carvalho, Andre C. P. L. F.
PROCEEDINGS OF 2016 5TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2016), 2016, : 55 - 60
[26] A deep learning method for automatic SMS spam classification: Performance of learning algorithms on indigenous dataset
Abayomi-Alli, Olusola
Misra, Sanjay
Abayomi-Alli, Adebayo
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (17)
[27] Comparison of Active Learning Strategies and Proposal of a Multiclass Hypothesis Space Search
dos Santos, Davi P.
de Carvalho, Andre C. P. L. F.
HYBRID ARTIFICIAL INTELLIGENCE SYSTEMS, HAIS 2014, 2014, 8480 : 618 - 629
[28] Active Learning to Speed-Up the Training Process for Dialogue Act Labelling
Ghigi, Fabrizio
Martinez-Hinarejos, Carlos-D.
Benedi, Jose-Miguel
HUMAN LANGUAGE TECHNOLOGY CHALLENGES FOR COMPUTER SCIENCE AND LINGUISTICS, 2014, 8387 : 253 - 263
[29] Evidential uncertainty sampling strategies for active learning
Hoarau, Arthur
Lemaire, Vincent
Le Gall, Yolande
Dubois, Jean-Christophe
Martin, Arnaud
MACHINE LEARNING, 2024, 113 (09) : 6453 - 6474
[30] Distributed Active Learning Strategies on Edge Computing
Qian, Jia
Hansen, Lars Kai
Gochhayat, Sarada Prasad
2019 6TH IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND CLOUD COMPUTING (IEEE CSCLOUD 2019) / 2019 5TH IEEE INTERNATIONAL CONFERENCE ON EDGE COMPUTING AND SCALABLE CLOUD (IEEE EDGECOM 2019), 2019, : 221 - 226

← 1 2 3 4 5 →