Active Learning Strategies for Textual Dataset-Automatic Labelling

被引:0
|
作者
Daudpota, Sher Muhammad [1 ]
Hassan, Saif [1 ]
Alkhurayyif, Yazeed [2 ]
Alqahtani, Abdullah Saleh [3 ,4 ]
Aziz, Muhammad Haris [5 ]
机构
[1] Sukkur IBA Univ, Dept Comp Sci, Sukkur 65200, Pakistan
[2] Shaqra Univ, Al Quwayiyah Coll Sci & Humanities, Shaqra 15526, Saudi Arabia
[3] King Saud Univ, Self Dev Skills Dept, Common First Year Deanship, Riyadh 12373, Saudi Arabia
[4] King Saud Univ, Dept Informat Syst, Coll Comp & Informat Sci, STCs Artificial Intelligence Chair, Riyadh 11451, Saudi Arabia
[5] Univ Sargodha, Coll Engn & Technol, Sargodha 40100, Pakistan
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2023年 / 76卷 / 02期
关键词
Active learning; automatic labelling; textual datasets; CLASSIFICATION;
D O I
10.32604/cmc.2023.034157
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The Internet revolution has resulted in abundant data from various sources, including social media, traditional media, etcetera. Although the availability of data is no longer an issue, data labelling for exploiting it in supervised machine learning is still an expensive process and involves tedious human efforts. The overall purpose of this study is to propose a strategy to automatically label the unlabeled textual data with the support of active learning in combination with deep learning. More specifically, this study assesses the performance of different active learning strategies in automatic labelling of the textual dataset at sentence and document levels. To achieve this objective, different experiments have been performed on the publicly available dataset. In first set of experiments, we randomly choose a subset of instances from training dataset and train a deep neural network to assess performance on test set. In the second set of experiments, we replace the random selection with different active learning strategies to choose a subset of the training dataset to train the same model and reassess its performance on test set. The experimental results suggest that different active learning strategies yield performance improvement of 7% on document level datasets and 3% on sentence level datasets for auto labelling.
引用
收藏
页码:1409 / 1422
页数:14
相关论文
共 50 条
  • [21] An automatic essay correction for an active learning environment
    Frinhani, Cristovao Lima
    Andrade de Freitas, Sergio Antonio
    Fernandes, Mauricio Vidotti
    Canedo, Edna Dias
    2016 IEEE/ACS 13TH INTERNATIONAL CONFERENCE OF COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2016,
  • [22] A Unified Framework for Automatic Distributed Active Learning
    Chen, Xu
    Wujek, Brett
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 9774 - 9786
  • [23] Active Learning Based Weak Supervision for Textual Survey Response Classification
    Patil, Sangameshwar
    Ravindran, B.
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT II, 2015, 9042 : 309 - 320
  • [24] The Impact Of Active Learning Strategies On Education And Learning Achievements
    Alkasem, Ahmad Abdulrahman
    Mohamed, Yuslina
    IJAZ ARABI JOURNAL OF ARABIC LEARNING, 2025, 8 (01): : 31 - 40
  • [25] Automatic selection of learning bias for active sampling
    dos Santos, Davi P.
    de Carvalho, Andre C. P. L. F.
    PROCEEDINGS OF 2016 5TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2016), 2016, : 55 - 60
  • [26] A deep learning method for automatic SMS spam classification: Performance of learning algorithms on indigenous dataset
    Abayomi-Alli, Olusola
    Misra, Sanjay
    Abayomi-Alli, Adebayo
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (17)
  • [27] Comparison of Active Learning Strategies and Proposal of a Multiclass Hypothesis Space Search
    dos Santos, Davi P.
    de Carvalho, Andre C. P. L. F.
    HYBRID ARTIFICIAL INTELLIGENCE SYSTEMS, HAIS 2014, 2014, 8480 : 618 - 629
  • [28] Active Learning to Speed-Up the Training Process for Dialogue Act Labelling
    Ghigi, Fabrizio
    Martinez-Hinarejos, Carlos-D.
    Benedi, Jose-Miguel
    HUMAN LANGUAGE TECHNOLOGY CHALLENGES FOR COMPUTER SCIENCE AND LINGUISTICS, 2014, 8387 : 253 - 263
  • [29] Evidential uncertainty sampling strategies for active learning
    Hoarau, Arthur
    Lemaire, Vincent
    Le Gall, Yolande
    Dubois, Jean-Christophe
    Martin, Arnaud
    MACHINE LEARNING, 2024, 113 (09) : 6453 - 6474
  • [30] Distributed Active Learning Strategies on Edge Computing
    Qian, Jia
    Hansen, Lars Kai
    Gochhayat, Sarada Prasad
    2019 6TH IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND CLOUD COMPUTING (IEEE CSCLOUD 2019) / 2019 5TH IEEE INTERNATIONAL CONFERENCE ON EDGE COMPUTING AND SCALABLE CLOUD (IEEE EDGECOM 2019), 2019, : 221 - 226