Active Learning Strategies for Textual Dataset-Automatic Labelling

被引:0
|
作者
Daudpota, Sher Muhammad [1 ]
Hassan, Saif [1 ]
Alkhurayyif, Yazeed [2 ]
Alqahtani, Abdullah Saleh [3 ,4 ]
Aziz, Muhammad Haris [5 ]
机构
[1] Sukkur IBA Univ, Dept Comp Sci, Sukkur 65200, Pakistan
[2] Shaqra Univ, Al Quwayiyah Coll Sci & Humanities, Shaqra 15526, Saudi Arabia
[3] King Saud Univ, Self Dev Skills Dept, Common First Year Deanship, Riyadh 12373, Saudi Arabia
[4] King Saud Univ, Dept Informat Syst, Coll Comp & Informat Sci, STCs Artificial Intelligence Chair, Riyadh 11451, Saudi Arabia
[5] Univ Sargodha, Coll Engn & Technol, Sargodha 40100, Pakistan
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2023年 / 76卷 / 02期
关键词
Active learning; automatic labelling; textual datasets; CLASSIFICATION;
D O I
10.32604/cmc.2023.034157
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The Internet revolution has resulted in abundant data from various sources, including social media, traditional media, etcetera. Although the availability of data is no longer an issue, data labelling for exploiting it in supervised machine learning is still an expensive process and involves tedious human efforts. The overall purpose of this study is to propose a strategy to automatically label the unlabeled textual data with the support of active learning in combination with deep learning. More specifically, this study assesses the performance of different active learning strategies in automatic labelling of the textual dataset at sentence and document levels. To achieve this objective, different experiments have been performed on the publicly available dataset. In first set of experiments, we randomly choose a subset of instances from training dataset and train a deep neural network to assess performance on test set. In the second set of experiments, we replace the random selection with different active learning strategies to choose a subset of the training dataset to train the same model and reassess its performance on test set. The experimental results suggest that different active learning strategies yield performance improvement of 7% on document level datasets and 3% on sentence level datasets for auto labelling.
引用
收藏
页码:1409 / 1422
页数:14
相关论文
共 50 条
  • [31] Strategies to mitigate student resistance to active learning
    Tharayil, Sneha
    Borrego, Maura
    Prince, Michael
    Nguyen, Kevin A.
    Shekhar, Prateek
    Finelli, Cynthia J.
    Waters, Cynthia
    INTERNATIONAL JOURNAL OF STEM EDUCATION, 2018, 5
  • [32] Active learning strategies for the design of sustainable alloys
    Rao, Ziyuan
    Bajpai, Anurag
    Zhang, Hongbin
    PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2024, 382 (2284): : 20230242
  • [33] Strategies to mitigate student resistance to active learning
    Sneha Tharayil
    Maura Borrego
    Michael Prince
    Kevin A. Nguyen
    Prateek Shekhar
    Cynthia J. Finelli
    Cynthia Waters
    International Journal of STEM Education, 5
  • [34] Evaluation of active learning strategies for video indexing
    Ayache, Stephane
    Quenot, Georges
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2007, 22 (7-8) : 692 - 704
  • [35] Downsampling for Binary Classification with a Highly Imbalanced Dataset Using Active Learning
    Lee, Wonjae
    Seo, Kangwon
    BIG DATA RESEARCH, 2022, 28
  • [36] Active Learning Strategies in the Subject: "Environmental Economics"
    Bove Sans, Miquel Angel
    ATTIC-REVISTA D INNOVACIO EDUCATIVA, 2013, (10): : 1 - 10
  • [37] Active Learning Strategies Based on Text Informativeness
    Li, Ruide
    Yamakata, Yoko
    Tajima, Keishi
    2022 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WI-IAT, 2022, : 32 - 39
  • [38] Active Learning Strategies for Hierarchical Labeling Microtasks
    Uo, Kousuke
    Kobayashi, Masaki
    Matsubara, Masaki
    Baba, Yukino
    Morishima, Atsuyuki
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 4647 - 4650
  • [39] Automatic traceability link recovery via active learning
    Tian-bao Du
    Guo-hua Shen
    Zhi-qiu Huang
    Yao-shen Yu
    De-xiang Wu
    Frontiers of Information Technology & Electronic Engineering, 2020, 21 : 1217 - 1225
  • [40] Active learning:: Theory and applications to automatic speech recognition
    Riccardi, G
    Hakkani-Tür, D
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (04): : 504 - 511