Deep Active Learning for Address Parsing Tasks with BERT

被引：1

作者：

Guler, Berkay ^{[1
]}

Aygun, Betul ^{[2
]}

Gerek, Aydin ^{[2
]}

Gurel, Alaeddin Selcuk ^{[2
]}

机构：

[1] Univ Calif Irvine, Donald Bren Sch Informat & Comp Sci, Irvine, CA 92697 USA

[2] Huawei Turkey Res & Dev Ctr, Istanbul, Turkiye

来源：

2023 31ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU | 2023年

关键词：

active learning; token classification; address data; BERT;

D O I：

10.1109/SIU59756.2023.10223996

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep learning models tend to perform better with larger datasets. With decreasing data handling costs, researchers have the means to gather and store vast amounts of unlabeled data. Supervised learning, on the other hand, requires training data to be labeled by annotators. However, high annotation costs pose challenges to labeling an optimum portion of the available data. One proposed method to mitigate this problem is to employ active learning (AL). AL strategies use a machine learning model to select the most informative and representative samples among unlabeled data points. Here, we demonstrate the effectiveness of uncertainty-based active learning strategies, including a new strategy, for address parsing with a BERT model on an in-house Arabic address dataset manually annotated for two different tasks. We compare AL methods with random sampling and longest-sentence baselines. We show that AL strategies' usefulness greatly depends on dataset characteristics, being less effective on datasets with fewer classes. We conclude that AL for address parsing with BERT decreases annotation costs, if measured in the number of queries. Yet, due to AL methods' tendency to select longer queries, some strategies may increase labeling costs, measured in the total number of words.

引用

页数：4

共 19 条

[1]

[Anonymous], 2010, Active Learning Challenge

[2]

Chen Y., 2015, A study of active learning methods for named entity recognition in clinical text

[3] Applying active learning to supervised word sense disambiguation in MEDLINE [J].

Chen, Yukun ;

Cao, Hongxin ;

Mei, Qiaozhu ;

Zheng, Kai ;

Xu, Hua .

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2013, 20 (05) :1001-1006

[4]

[Clark Kevin ELECTRA ELECTRA], 2020, arXiv, DOI [DOI 10.48550/arXiv.2003.10555, DOI 10.48550/ARXIV.2003.10555, 10.48550/arXiv.2003.10555]

[5]

Culotta A., 2005, Proceedings of the 20th National Conference on Artificial Intelligence (AAAI), P746

[6]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

[7]

Ein-Dor L, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), P7949

[8] Active learning for clinical text classification: is it better than random sampling? [J].

Figueroa, Rosa L. ;

Zeng-Treitler, Qing ;

Ngo, Long H. ;

Goryachev, Sergey ;

Wiechmann, Eduardo P. .

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2012, 19 (05) :809-816

[9] TOWARDS BETTER UNCERTAINTY SAMPLING: ACTIVE LEARNING WITH MULTIPLE VIEWS FOR DEEP CONVOLUTIONAL NEURAL NETWORK [J].

He, Tao ;

Jin, Xiaoming ;

Ding, Guiguang ;

Yi, Lan ;

Yan, Chenggang .

2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, :1360-1365

[10]

Ohman J., 2021, Master's thesis

← 1 2 →