Active Learning Strategies for Textual Dataset-Automatic Labelling

被引:0
|
作者
Daudpota, Sher Muhammad [1 ]
Hassan, Saif [1 ]
Alkhurayyif, Yazeed [2 ]
Alqahtani, Abdullah Saleh [3 ,4 ]
Aziz, Muhammad Haris [5 ]
机构
[1] Sukkur IBA Univ, Dept Comp Sci, Sukkur 65200, Pakistan
[2] Shaqra Univ, Al Quwayiyah Coll Sci & Humanities, Shaqra 15526, Saudi Arabia
[3] King Saud Univ, Self Dev Skills Dept, Common First Year Deanship, Riyadh 12373, Saudi Arabia
[4] King Saud Univ, Dept Informat Syst, Coll Comp & Informat Sci, STCs Artificial Intelligence Chair, Riyadh 11451, Saudi Arabia
[5] Univ Sargodha, Coll Engn & Technol, Sargodha 40100, Pakistan
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2023年 / 76卷 / 02期
关键词
Active learning; automatic labelling; textual datasets; CLASSIFICATION;
D O I
10.32604/cmc.2023.034157
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The Internet revolution has resulted in abundant data from various sources, including social media, traditional media, etcetera. Although the availability of data is no longer an issue, data labelling for exploiting it in supervised machine learning is still an expensive process and involves tedious human efforts. The overall purpose of this study is to propose a strategy to automatically label the unlabeled textual data with the support of active learning in combination with deep learning. More specifically, this study assesses the performance of different active learning strategies in automatic labelling of the textual dataset at sentence and document levels. To achieve this objective, different experiments have been performed on the publicly available dataset. In first set of experiments, we randomly choose a subset of instances from training dataset and train a deep neural network to assess performance on test set. In the second set of experiments, we replace the random selection with different active learning strategies to choose a subset of the training dataset to train the same model and reassess its performance on test set. The experimental results suggest that different active learning strategies yield performance improvement of 7% on document level datasets and 3% on sentence level datasets for auto labelling.
引用
收藏
页码:1409 / 1422
页数:14
相关论文
共 50 条
  • [41] Active Dataset Generation for Meta-learning System Quality Improvement
    Zabashta, Alexey
    Filchenkov, Andrey
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2019, PT I, 2019, 11871 : 394 - 401
  • [42] Iterative active learning strategies for subgraph matching
    Ge, Yurun
    Yang, Dominic
    Bertozzi, Andrea L.
    PATTERN RECOGNITION, 2025, 158
  • [43] Automatic traceability link recovery via active learning
    Du, Tian-bao
    Shen, Guo-hua
    Huang, Zhi-qiu
    Yu, Yao-shen
    Wu, De-xiang
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2020, 21 (08) : 1217 - 1225
  • [44] ACTIVE LEARNING FOR ACCENT ADAPTATION IN AUTOMATIC SPEECH RECOGNITION
    Nallasamy, Udhyakumar
    Metze, Florian
    Schultz, Tanja
    2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 360 - 365
  • [45] Active Learning with Data Augmentation Under Small vs Large Dataset Regimes for Semantic-KITTI Dataset
    Duong, Ngoc Phuong Anh
    Almin, Alexandre
    Lemarie, Leo
    Kiran, B. Ravi
    COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VISIGRAPP 2022, 2023, 1815 : 268 - 280
  • [46] A deep learning-based solution for digitization of invoice images with automatic invoice generation and labelling
    Arslan, Halil
    Isik, Yunus Emre
    Gormez, Yasin
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2024, 27 (01) : 97 - 109
  • [47] A deep learning-based solution for digitization of invoice images with automatic invoice generation and labelling
    Halil Arslan
    Yunus Emre Işık
    Yasin Görmez
    International Journal on Document Analysis and Recognition (IJDAR), 2024, 27 : 97 - 109
  • [48] Trustability-Based Dynamic Active Learning for Crowdsourced Labelling of Emotional Audio Data
    Hantke, Simone
    Abstreiter, Alexander
    Cummins, Nicholas
    Schuller, Bjoern
    IEEE ACCESS, 2018, 6 : 42142 - 42155
  • [49] Plusmine: Dynamic Active Learning with Semi-Supervised Learning for Automatic Classification
    Klein, Jan
    Bhulai, Sandjai
    Hoogendoorn, Mark
    van der Mei, Rob
    2021 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2021), 2021, : 146 - 153
  • [50] MiikeMineStamps: A Long-Tailed Dataset of Japanese Stamps via Active Learning
    Buitrago, Paola A.
    Toropov, Evgeny
    Prabha, Rajanie
    Uran, Julian
    Adal, Raja
    DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT III, 2021, 12823 : 3 - 19